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Preface 


The data is changing the way society and technology evolves, with the advent of IoT, 
Big Data, ML and AI, a rapid development in technology towards more human- 
centric applications has been envisaged. The finance and insurance sectors are not 
an exception and developments in Fin Tech and insurance-tech are in a phase of 
developing unique offerings. 

It is very important to have a common understanding of the actual conditions 
in the financial and insurance sectors and how the technology can help to advance 
and evolve those conditions in a positive manner. By discussing the principles of 
the modern economy that make the modern financial sector and FinTech the most 
disruptive areas in today's global economy, a better understanding and knowledge 
will be acquired. 

The use of data-driven approaches envisions many opportunities emerging for 
activating new channels of innovation on the local and global scale while at the 
same time catapulting opportunities for more disruptive human-centric services. 
Data-driven human-centric applications are at the same time the result of a shared 
vision from a natural evolution of technology and society. Experts in the financial 
and insurance sectors are looking at a dramatic change in how people think about 
global economy and at the same time the technology is facilitating the instruments 
for new ways of understanding, providing a common vision and identifying impacts 
in finance and insurance. 

The INFINITECH book series is focused on addressing the need for clear infor- 
mation for better understanding of the foundations, principles and technologies for 
experts and non-technical experts that participate in the financial and insurance 
process and the constant need for innovation and new services across banks and 
insurance organizations. 
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Who Should Read This Book? 


Financial & Insurance Regulators 


The unique offering for non-technical experts but that participate in the financial 
regulatory process and of the core service to enable the sharing of innovation and 
new services across banks and insurance without exchanging any customer data. 


General Public & Students 


The power of understanding the future of FinTechs, their services and their ability 
to identify different methodologies indicators from a human perspective. 


Entrepreneurs and SMEs 


The most powerful tools to innovate, increase opportunities and increase the power 
of innovation into small and entrepreneurs to meet its full potential if there is good 
participation across the banking and insurance sector. 


Technical Experts & Software Developers 


The guide for technologies and legacy open and non-open sources as a guidebook 
for including the most recent experiences in Europe towards innovating technology 
for the financial and banking sectors. 


What is Addressed in the Book Series? 


“Concepts and Design Thinking Innovation addressing the Global 
Financial Needs” 


In the first part of the INFINITECH book series we begin by discussing the prin- 
ciples of the modern economy that make the modern financial sector and FinTech 
the most disruptive areas in today’s global economy. INFINITECH envision many 
opportunities emerging for activating new channels of innovation on the local and 
global scale while at the same time catapulting opportunities for more disruptive 
user- centric services. INFINITECH is at the same time the result of a shared vision 
from a representative global group of experts, providing a common vision and iden- 
tifying impacts in the financial and insurance sectors. 


“Methods and Design Principles for Financial Innovation, Explaining the 
Supply Side for Interoperability in Finance- and Insurance-Tech” 


In the second part of the series we review the basic concepts for Fintech referring to 
the diversity in the use of technology to underpin the delivery of financial services. 
The demand and the supply side in the financial sector are demonstrated, and fur- 
ther discussed is why FinTech is the focus of industry nowadays and the meaning 
for waves of digitization. Financial technology (FinTech) and insurance technology 
(InsuranceTech) are rapidly transforming the financial and insurance services indus- 
try. We provide an overview of Reference Architecture (RA) for BigData, IoT and AI 
applications in the financial and insurance sectors (INFINITECH-RA). Moreover, 
this book reviews the concept of innovation and its application in INFINITECH, 
and innovative technologies provided by the project for financial sector practical 
examples. 


xi 


xii What is Addressed in the Book Series? 


“Technical Financial Innovation, Solving the Interoperability 
Problems of Europe” 


The third book begins by providing a definition for FinTech as: The use of tech- 
nology to underpin the delivery of financial services. This book further discusses 
why FinTech is the focus of industry nowadays as the waves of digitization and 
the way financial technology (FinTech) and insurance technology (InsuranceTech) 
are rapidly transforming the financial and insurance services industry. In this 
book technology assets that followed the Reference Architecture (RA) for BigData, 
IoT and AI applications are introduced. Moreover, the series of assets includes 
the domain area where applications from the INFINITECH innovation project 
and the concept of innovation for the financial sector are described. Further, we 
describe INFINITECH Marketplace and its components including details of avail- 
able assets. Next, we provide descriptions of solutions developed in INFINITECH. 


What is Covered in this 
INFINITECH Part Ill Book? 


“Technical Financial Innovation, Solving the Interoperability 
Problems of Europe.” 


Technology frameworks and testbed tools (sandboxes) are popular these days. From 
the point of view of the deployment and testing, technology Pilots are defined 
in terms of resources used to deploy solutions (infrastructure) and Sandboxes 
(components). The different technologies developed within the context of the 
INFINITECH way initiative is considered as a bill of materials of the resources 
needed to perform demonstrators, proof of concepts and prototype solutions. The 
provided information can be used as input for configurators and cost structures to 
set up the testbeds and therefore it is extremely valuable to organizations from IT 
to financial and procurement departments. 

The third book begins by providing a definition for Fin Tech as: the use of tech- 
nology to underpin the delivery of financial services. This book further discusses 
why FinTech is the focus of industry nowadays as the waves of digitization and the 
way Financial Technology (FinTech) and Insurance Technology (InsuranceTech) 
are rapidly transforming the financial and insurance services industry. In this 
book technology assets that followed the Reference Architecture (RA) for BigData, 
IoT and AI applications are introduced. Moreover, the series of assets includes 
the domain area where applications from the INFINITECH innovation project 
and the concept of innovation for the financial sector are described. Further, we 
describe INFINITECH Marketplace and its components including details of avail- 
able assets. Next, we provide descriptions of solutions developed in INFINITECH. 
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Abstract 


Technology frameworks and testbed tools (sandboxes) are popular these days. From 
the point of view of the deployment and testing, technology Pilots are defined in 
terms of resources used to deploy solutions (infrastructure) and Sandboxes (com- 
ponents). 

The different technologies developed within the context of the INFINITECH 
way initiative is considered as a bill of materials of the resources needed to perform 
demonstrators, proof of concepts and prototype solutions. The provided informa- 
tion can actually be used as input for configurators and cost structures to set up the 
testbeds and therefore it is extremely valuable to organizations from IT to financial 
and procurement departments. 

The third book begins by providing a definition for FinTech as: The use of tech- 
nology to underpin the delivery of financial services. This book further discusses why 
FinTech is the focus of industry nowadays as the waves of digitization and the 
way Financial Technology (FinTech) and Insurance Technology (InsuranceTech) 
are rapidly transforming the financial and insurance services industry. 

In this book technology assets that followed the Reference Architecture (RA) for 
BigData, IoT and AI applications are introduced. Moreover, the series of assets 
includes the domain area where applications from the INFINITECH innova- 
tion project and the concept of innovation for the financial sector are described. 
Further, we describe INFINITECH Marketplace and its components including 
details of available assets. Next, we provide descriptions of solutions developed in 
INFINITECH. 
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Why FinTech 


The term “FinTech” has been bandied around for a few years to describe the trans- 
formation taking place in the financial services markets in both the developed and 
the developing world. At the same time, forms of payment such as cheques are 
being displaced with cards. The cards are then being disrupted by contactless pay- 
ment and digital wallets, which are now becoming the norm. Juniper defines fintech 
as: The use of technology to underpin the delivery of financial services. [The Future of 
FinTech — The New Standard] 

The emergence of FinTech is increasingly blurring the lines between the two, 
previously very distinct, Financial Services (FS) and Technology (Tech), sectors. 
Fin Tech brings together the best of both these sectors to disrupt the financial ser- 
vices industry by enhancing customer experience, increasing the speed of service, 
and reducing operating cost through digitization. [deloitte uk human capital] 

Historically, the financial and insurance services sectors have been quite resis- 
tant to technology disruption. However, this is no longer the case as the 
waves of digitization, Financial Technology (FinTech) and Insurance Technology 
(InsuranceTech) are rapidly transforming the financial and insurance services indus- 
try [Dietz16], [PwC17]. This is evident in the momentum and tangible growth 
of Fin Tech/InsuranceTech enterprises and in the volume of relevant investments: 
Over $23 billion of venture capital and growth equity has been allocated to Fin Tech 
innovations during 2011-2014, while $12.2 billion was deployed in 2014 alone. 
Moreover, a recent McKinsey & Co study revealed that the number of Fin Tech 
startups in 2016 exceeded 2.000, from approx. 800 in 2015. Furthermore, the vast 
majority of global banks and investment firms have already planned to increase 
their Fin Tech/InsuranceTech investments with a view to yielding a 20% average 
return on their investments. Beyond Fin Tech/InsuranceTech, financial institutions 
and insurance organizations are heavily investing in their digital transformation, 
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as a means of improving the efficiency of their business processes and optimizing 
their decision making. 

The vast majority of digital transformation applications for the finance and 
insurance sectors are data intensive. This hold for applications in different areas 
such as retail banking, corporate banking, payments, investment banking, capital 
markets, insurance services, financial services security and more. 

All these applications leverage very large datasets from legacy banking systems 
(e.g., customer accounts, customer transactions, investment portfolio data), which 
they combine with other data sources such as financial markets data, regulatory 
datasets, real-time retail transactions and more. With the advent of Internet-of- 
Things (IoT) devices and applications (e.g., fitbits, smart phones, smart home 
devices), several Fin Tech/InsuranceTech applications can take advantage of contex- 
tual data to offer better quality of service at a more competitive cost (e.g., person- 
alized healthcare insurance based on medical devices and improved car insurance 
based on connected car sensors). 

Furthermore, alternative data sources (e.g., social media and on-line news) pro- 
vide opportunities for new more automated, personalized, and accurate services. 
Moreover, recent advances in data storage and processing technologies (including 
advances in Artificial Intelligence (AI) and blockchain technologies) provide new 
opportunities for exploiting the above-listed massive datasets and are expected to 
stimulate more investments in digital finance/insurance services. 

Overall, financial and insurance organizations take advantage of BigData and 
IoT technologies to improve the accuracy and cost-effectiveness of their services, as 
well as the overall value that they provide to their customers. Nevertheless, despite 
early deployment instances, there are still many challenges that have to be overcome 
prior to leveraging the full potential of BigData/IoT/AI in the finance and insur- 
ance sectors, which could also act a catalyst for attracting more investments and for 
significantly improving the competitiveness of enterprises in these sectors. 

In particular, financial institutions and insurance organizations are currently 


faced with the following challenges: 


* Data Fragmentation and Interoperability Barriers. 

* Limitations for Cost-Effective Real-Time Analytics. 

* Regulatory Barriers. 

e Data Availability Barriers. 

* Lack of Blueprint Architectures for BigData Applications. 
* No Validated Business Models. 
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In order to address these challenges, and leverage the full potential of BigData, 
IoT and Al and IoT in finance/insurance, there is a need for developments in several 
parallel streams, including: 


* Technical/Technological Developments. 
* Development of Experimentation Infrastructures (Testbeds & Sandboxes). 
* Validation of Novel Business Models. 
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Chapter 1 


INFINITECH Reference 
Architecture Overview 


11 Reference Architecture for BigData/IOT/AI in 
Finance/Insurance 


The aim of this book series is to review and validate the Reference Architecture 
(RA) for BigData, IoT and AI applications in the financial and insurance sectors 
(INFINITECH-RA), which will serve as a blueprint for the rapid and cost effective 
solutions development and deployment. The INFINITECH-RA will specify a set 
of building blocks that will support advanced BigData, AI and IoT applications. 
These building blocks will support scalable, unified and interoperable data col- 
lection from different sources and databases (e.g., OLTP-On-Line Transactional 
Processing-, OLAP-On-line Analytical Processing-, Data Lakes, SOL databases, 
NoSQL databases, alternative data sources), efficient real-time predictive analytics, 
multi-channel/Omni-channel interactions, data governance functionalities, as well 
as interoperable data sharing and interactions between stakeholders of the financial 
& insurance value chains. INFINITECH-RA will specify the structuring princi- 
ples that will drive the integration of these building blocks in real-life solutions. 
The INFINITECH-RA will serve as a basis for designing, developing, and deploy- 
ing novel BigData, AI and IoT solutions that feature "SHARP" (Smart, Holistic, 


2 INFINITECH Reference Architecture Overview 


Autonomy, Personalized and Regulatory Compliance) characteristics. The project 
will also provide a number of blueprints for developing and deploying solutions 
aligned to the INFINITECH-RA. The blueprints will be based on the elaboration 
of different designs and deployment configurations that will be tailored to the needs 
of specific solutions. Both the INFINITECH-RA and its relevant blueprints will 
address functionalities that are prioritized as part of the SRIA (Strategic Research 
and Innovation Agenda) of the BDVA (BigData Value Association), while consid- 
ering and consolidating concepts from RAs introduced by relevant standardization 
bodies and associations. [the proposal] 

It will provide an overall of the technical requirements and specifications driving 
the project, as well as the detailed specification of the INFINITECH data mod- 
els, technology/regulatory building blocks and the INFINITECH-RA. In here the 
following objectives are satisfied: (i) To articulate stakeholders’ requirements regard- 
ing BigData and IoT-based services with SHARP properties in the financial and 
insurance sectors; (ii) To refine and detail the SHARP properties of various services 
in the target sectors; (iii) To analyze the background BigData and IoT platforms 
that will support the pilots and testbeds of the project, and to detail how they will 
be enhanced in order to empower the INFINITECH vision; (iv) To specify the 
security and regulatory compliance requirements of the INFINITECH services, 
while at the same time specifying the relevant solutions to be used in the project; 
(v) To specify the capabilities of the testbeds that will support the development and 
deployment of SHARP services; (vi) To specify the INFINITECH-RA. 


1.2 INFINITECH Reference Architecture 


Reference Architecture (RA) of the INFINITECH project aimed to develop Smart, 
Autonomous and Personalized Services in the European Finance and Insurance Ser- 
vices Ecosystem. The INFINITECH partners have selected a methodology to work 
on the RA, identifying it in the “4+ 1” architectural view model, which is presented 
in the document. The methodology is based on five different views, from which the 
structure of the system can be analyzed (logical view, process view, development 
view, physical view, and scenarios). Moreover, it will be demonstrated that all the 
functionalities of INFINITECH environment are properly covered by this model. 
[D2.13] 

The State-of-the-Art survey underlines that some already existing Reference 
Architectures provide substantial input to INFINITECH, such as the pipelined 
and workflow approach to support the functionalities of the different Pilots and 
Use Cases of the project. Relevant inputs to the task have been considered, in 
particular the input coming from use-cases considered in task "User Stories and 
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Analysis of Stakeholders’ Requirements” and a cross reference matrix. Finally, a 
layered and high-level reference view and a detailed logical view of the RA are 
presented. Different layers have been identified (infrastructure, data management 
and protection, data processing and architecture, analytics, interface, and presen- 
tation/visualization). The layers are mainly a mean of classification of the build- 
ing blocks to form different workflows. The resulting RA provides a schema for 
building solid workflows and ensures full communication and interaction between 
all the building blocks, from the data source level (at the infrastructure level of 
the organizations) up to the Data Stores and Processing Analytics to presenta- 
tion and visualization applications. High-Performance Computing (HPC) can be 
distributed at nodes within the platform supporting a high degree of scalability. 
[D2.12 to D2.15] 

Moreover, RA considers for external data sources such as public and private Data 
Lakes, IoT networks and Blockchains. A list of identified building blocks provides 
the basic functionalities of the INFINITECH reference sandbox for a more gen- 
eral class of use cases. Building blocks are identified where existing technologies 
are available while other components and designed, implemented, and integrated 
during the INFINTEHC project duration as explained in public deliverables but 
summarized in this series book. [D2.13] 

The validity of the RA has been proved by mapping the workflows of the pilots 
of the projects, ultimately proving the conceptual approach of the INFINITECH 
RA. The RA constitutes a living solution constantly verified during the continuous 
project development and with the different pilots. Moreover, the Consortium will 
promote the RA, along with its methodology and technological advancements, dur- 
ing project dissemination as a more general solution applicable to a broader set of 
different use cases beyond the original scope in the Financial and Insurance sectors 
whenever Big Data and AI are to be considered. [D2.13] 

RAs are designed for facilitating design and developments of concrete technolog- 
ical architectures, mostly in the IT domain, reducing risks with proven components, 
all while improving overall communication within an organization. Real drawbacks 
and benefits of RAs have been analyzed with respect to the project's Pilots. RA facil- 
itated development of concrete IT architectures and reduced maintenance costs. In 
general, the value of RAs can be summarized in the following points: [D2.15] 


* Reduction of development and maintenance costs of systems. 
* Facilitation of communication between important stakeholders. 
* Reduction of risks. 


Typically, when a system is designed without a RA, an organization may accu- 
mulate technical risks and end up with a complex and non-optimal implementation 
architecture. [D2.15] 


4 INFINITECH Reference Architecture Overview 


In the industry, complex infrastructures for big data systems and high- 
performance computing (HPC) have been developed and proved to sustain inten- 
sive data processing services (Netflix, Facebook, Twitter, LinkedIn etc.). The archi- 
tectures and technologies of world class infrastructure have been published and RAs 
have been designed and proposed. However, very few solutions have been published 
for the Financial and Insurance sectors and this paper aims to partially cover the gap. 
In the following sections, some relevant Reference Architectures and Models will be 
considered along with their relevance to the domain sector at which INFINITECH 
is aimed. [D2.15] 

The purpose of a Reference Architecture is to provide a conceptual and logical 
schema for solutions to a large class of problems. In the INFINITECH project 
the domain is as vast as the Financial and Insurance Sectors where most of the 
applications are data-driven. The class of problems of the project (pilots and use 
cases) and in general the service management of financial institutions and insurance 
companies are largely based on data that should be managed in the safest and most 
protective way. [D2.13] 

In these domains customers’ enormous data sets must be processed to derive 
information with the purpose to provide better and more competitive services 
respecting the complex and sometimes conflicting regulatory frameworks such as 
privacy, security, interoperability, etc. [D2.13] 

Therefore, a reference architecture should have explored in advance the specific 
domains in which the class of problems must find solutions providing a general 
model to which stakeholders (end-users, business owners, designers, data scientists, 
developers, maintainers etc.) can refer for best practices in the specific problem- 
solution space. In information technology, a RA can be used to check solutions to a 
particular problem in that class against the best practices and specific technologies. 
The INFINITECH-RA is no exception, and it is the result of the analysis of the 
significant number of use cases in the project's pilots, their requirements (users’ 
stories) and constraints (regulatory, sector and technological) as well as the state- 
of-the art technologies and similar architectures. [D2.13] 

It is important to state what is the RA in the INFINITECH project: 


* Aset of views for the Logical, Process, Development and Physical implemen- 
tations. 

e A set of common scenarios referring to generic use cases. 

* A way to verify the use cases’ scenarios and solutions. 

* A way to speak the same language among stakeholders. 

* A way to leverage solutions referring to best practices and building blocks. 

e A way to verify if constraints in requirements, regulatory, technical and logical 
have been addressed properly. 


INFINITECH Reference Architecture 


It is also important to state what the INFINITECH RA is NOT: 


e A ready-to-deploy technological IT framework. 

* A rigid and unmutable set of connecting building blocks. 
* A set of mandatory rules for development and integration. 
e A manual for implementation and rollouts. 

* A one-size-fits-all recipe for all business cases. 
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Chapter 2 


Innovative Technologies for the 
Financial Sector 


2.1 Innovative Technologies for Financial Sector 


The INFINITECH project rely in the capabilities of its partner members to pro- 
duce value that is exploitable beyond the lab or Proof of Concept of ideas, The 
Figure 2.1 shows the INFINITECH Innovation Roadmap where expertise from 
academia and research converges with industrial products and exploitation plans 
that uses design principles and reference architectures into creating a reference 
implementation for the financial and insurance sectors. The participation of stake- 
holders also complements the activity and brings value to transform ideas and inno- 
vate. [Innovation Readiness Assessment] 

The data management functionalities of existing BigData and IoT platforms are 
not sufficient for fulfilling the needs of financial/insurance sectors applications, in 
terms of the need for providing integrated and unified access over the wide range of 
fragmented “siloed” data, the need for handling data streams and data at rest at the 
same time, as well as the need for supporting cost-effective execution of advanced 
data mining algorithms. 


Innovative Technologies for Financial Sector 
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INFINITECH will provide solutions for integrated data management over the 
wide range of databases and data sources used by BigData, IoT and AI applications 
in finance/insurance, including OLTP, OLAP, streaming data sources, structured 
data sources, unstructured data sources, semi-structured data source and more. 
The integrated data management solutions of the project will include: (i) A solu- 
tion for integrated OLTP & OLAP data processing (i.e. so called HTAP1), which 
will facilitate unified access over data residing in both OLAP systems and oper- 
ational databases without a need for tedious and expensive ETLs; (ii) A solution 
for integrated querying over both streaming data and data at rest; (iii) A solution 
for integrated and unified access over both SQL and NoSQL databases i.e. unified 
access over structured, unstructured and semi-structured data; and (iv) A solution 
for intelligent data pipelining and automated parallelization of stateful streaming 
engines to facilitate distributed high performance analytics. 

INFINITECH will enable integrators and data scientists working on digi- 
tal finance/insurance solutions to access data in a unified and integrated way, 
regardless of the underlying database/data store and the type of data handled (i.e. 
streaming/at-rest). This will facilitate the development of BigData/AI/IoT applica- 
tions, through providing unified data access and query APIs, while at the same time 
improving the performance of queries and analytical algorithms. 

Moreover, applications in the finance and insurance sectors must comply with 
many quite complex regulations. This holds for most BigData and IoT applications, 
which tend to be data-intensive and to involve complex data processing across mul- 
tiple systems and stakeholders. 

Project ensures that the INFINTECH platform is secure & regulatory compli- 
ant by design, through providing relevant data governance and regulatory compli- 
ance technologies in-line with the INFINITECH-RA. In particular, the project will 
design and implement technological building blocks for anonymization, authenti- 
cation against eIDAS, management of consent, as well as data policies manage- 
ment. These building blocks will be correlated with the main regulations that have 
to be supported by financial and insurance organizations such as PSD2, 4MLD, 
MiFID II and GDPR. Regulatory compliance tools against these regulations using 
the project’s tools will be developed. As part of the objective, BigData and IoT plat- 
forms/toolkits that will be used in the project will be enhanced with a data gover- 
nance and regulatory compliance layer, which will be flexibly integrated in BigData 
and IoT applications in various configurations and in-line with their requirements 
and the INFINITECH-RA. 

Further, existing BigData/IoT applications in the financial and insurance sec- 
tors form in most cases disaggregated (data) “silos”, which are hardly interopera- 
ble with systems and application of other financial institutions and administrative 
domains. Likewise, there is very poor interoperability across the diverse datasets 
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that are typically collected and used in financial/insurance applications (including 
FinTech and InsuranceTech applications). 

INFINITECH introduces building blocks for semantic interoperability and 
interoperable data exchange capabilities, as means of facilitating the develop- 
ment and deployment of innovative applications that span multiple systems and 
stakeholders in the financial supply chain (e.g., cross-border transactions, SWIFT 
payments, blockchain applications). As part of the objective, two solutions for inter- 
operability and data exchange will be developed, including: (A) A centralized solu- 
tion that will be based on the development of an interoperability database/registry 
supporting linking of diverse systems and datasets based on shared semantics, as 
well as semantically interoperable analytics. The semantic interoperability solution 
will leverage mainstream ontologies that are used in the financial sector (e.g., FIBO 
(Financial Industry Business Ontology) and FIGI (Financial Instrument Global 
Identifier)), along with distributed engines for massively parallelized and high- 
performance analytics over semantic data streams and related ontologies (i.e. high- 
performance execution of SPARQL queries). (B) A decentralized solution that will 
be based on the deployment of distributed ledger technologies and the use of per- 
missioned blockchain concepts for decentralized trust. It will be based on infras- 
tructures for permissioned blockchains that will become part of the INFINITECH 
solutions & testbeds to facilitate financial/insurance processes that involve cross- 
organization data exchange (e.g., credit risk scoring and KYC/KYB processes). The 
blockchain solution of the project will be augmented with functionalities for tok- 
enization and secure/private-friendly querying. Tokenization will enable assets trad- 
ing, while secure querying will enable querying blockchain data (e.g., customer 
data) without decrypting the source data as a means of providing strong privacy and 
data protection for applications (e.g., KYC, customer personalization) that need it. 

Novel AI & BigData analytics applications for the finance and insurance sectors 
must combine a multitude of advanced analytics techniques (e.g., supervised, unsu- 
pervised and reinforcement learning models) over a great variety of data stores. To 
this end, they can greatly benefit from mechanisms and tools that facilitate access 
to analytics functions (including APIs), as well as from low-latency techniques & 
algorithms that could boost real-time analytics functions. 

INFINITECH will develop a range of enablers for efficient, high-performance 
analytics that combine data from multiple sources and enable low latency, near 
real-time operations. To facilitate high-performance analytics INFINITECH will 
parallelize incremental algorithms that are commonly used in finance/insurance 
applications (e.g., clustering, collaborative filtering and frequent pattern matching) 
as a means of accelerating their execution. At the same time, to facilitate develop- 
ment of analytics applications, INFINITECH will provide a declarative analytics 
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framework that will enable analytics over diverse data sources based on conven- 
tional SQL-like primitives and in a way that handles the underlying complexity of 
data sources. 

Moreover, the project will offer library of ML/DL algorithms for analytics in the 
finance/insurance sector, including both conventional algorithms and parallelized 
incremental algorithms. The project's ML/DL algorithms will also be made avail- 
able as part of open data science frameworks and repositories such as OpenML, as 
a means of boosting their sustainability and wider use. 

Overall, we can categorize certain challenges in the Fin/InsurTech sector as 
below: 


* Unable to perform analytics over operational data. 
o Cannot provide real-time Business Intelligence. 


* no integrated and unified access over fragmented “siloed” data. 

* cannot perform real-time ingestion and analysis of real-time streams. 
* no integrated query processing of data at-rest with in-flight. 

* no cost-effective execution of advanced data mining algorithms. 


o data rates are dynamic, we must scale data and computation capabilities. 


* many and quite complex regulations in finance/insurance sector. 
* BigData/IoT applications tend to be data-intensive and to involve complex 
data processing across multiple systems and stakeholders. 


To tackle these challenges INFINITECH developed certain technologies: 


* Data management layer 


o HTAP (Hybrid Transactional/Analytics Processing) capabilities 
o Data ingestion at very high rate 
o Polyglot query processing 


* Streaming processing framework 


o Integrated query processing of data in-flight and at-rest 
o Auto-scalability of parallelized streaming operators 
o Intelligent data pipelines 


e Data regulation and governance 
o Provision of data governance mechanisms 
o Regulatory constraints tools 


INFINITECH achievement can be grouped into the following categories: 


* Enable data ingestion at very high rates with linear scalability, ideally used for 
data offloading. 
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Provide data analytics over operational data (HTAP) while data is being 
ingested concurrently. 

Analytics on operational data, reduce the need for ETL, enabling real-time 
business intelligence. 

Support advanced analytics combining real-time streaming data with histor- 
ical data. 

Implementation of change data capture mechanism via the Intelligent Data 
Pipelines. 

Unified framework for query processing over streaming/static data. 

FinFlik enables linear scalability of streaming operators, enabling their fully 
parallelization. 

Efficient resource consumption via the automation of scaling actions pro- 
vided by the FinFlink library. 

Data Governance and Anonymization Framework. 

Policy Rules Orchestration Enforcement Framework. 


The impact of these technologies applied can be listed as below: 


Analytics on operational data, reduce the need for ETL, enabling real-time 
business intelligence. 

Support advanced analytics combining real-time streaming data with histor- 
ical data. 

Efficient resource consumption via the automation of scaling actions pro- 
vided by the FinFlink library. 

Provide a good safeguard against data misuse, enabling regulatory compliance 
such as the GDPR. 

Set of privacy and utility metrics to better address privacy risk, preserving the 
utility of the data. 

Creation of user identities without sharing biometric information. 
Facilitates the orchestration of technologies for preserving privacy, data pro- 
tection and security. 

Enables future compliance with new or changed or freshly-identified regula- 
tions. 


To deliver on the assigned tasks, INFINITECH applied the following 
technologies: 


l. 


2. 


A pseudonymization tool that automatically determines the best 
anonymization configuration for each application. 
A tool for anonymizing data that automatically determines the best 
anonymization configuration for each application. 
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3. A mobile digital user onboarding services with virtual eID derived from 
government issued documents (ePassport or eID card). 

4. A solution for authenticating citizens and businesses against the eIDAS 
infrastructure, providing a cross-border strong authentication mechanism 
based on eIDs. 


2.2 INFINITECH Marketplace 


The project's validated solutions at the technological or business level are made 
available at the project multi-sided market platform (marketplace) and/or a Vir- 
tualized Digital Innovation Hub (VDIH) for wider use and commercial exploita- 
tion. The market platform and the VDIH of the project are the points where the 
project will interact with other stakeholders of the digital finance/insurance and 
FinTech/InsuranceTech ecosystem. INFINITECH will build a community around 
its results i.e. based on stakeholders from all EU-28 countries that will engage in 
the market platform and/or the VDIH. 

The INFINITECH multi-sided market platform offers BigData, AI, IoT, 
Blockchain and VDIH solutions is a public web-based environment with various 
APIs, able to store several types of assets that may derive/result from the separate 
procedures and mechanisms that are either implemented in the scope of the project 
or not (e.g. third-party contributions through hackathons/webinars etc.). 

The market platform will integrate ready-to-use solutions and assets of the 
project, such as (synthetic) datasets/data assets, ML/DL algorithms, as well as val- 
idated turnkey solutions/applications for finance and insurance. At the same time, 
the innovation labs and Fin Tech/InsuranceTech clusters of the consortium will fed- 
erate resources towards establishing a virtualized DIH, which will provide a host of 
innovation management services to incumbent financial/insurance organizations, 
but mainly to Fin Tech/InsuranceTech innovators. The projects market platform 
and VDIH will provide a single pan European entry point for accessing resources 
for BigData, IoT and AI innovations in the finance/insurance sectors. The market 
platform and the VDIH are the main enablers of the projects exploitation strategy. 

INFINITECH will specify and implement the interactions of these building 
blocks, to support all stakeholders across the entire lifecycle of a novel BigData/IoT 
product or service for finance or insurance. This lifecycle includes several stages 
from the inception of the service to its technical and business validation. The latter 
validation is a prerequisite for the production use of the service, but also for making 
resources associated with the service available as part of the market platform and 
the VDIH of the project. 
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The multi-sided marketplace enables the utilization of Big Data and AI tech- 
nologies, serving as a single end-point for the available assets found in the market- 
place and includes algorithms, datasets, frameworks, webinars and lectures, docker 
containers and combined solutions, scalable and adaptable to fit into the business 
needs of companies and organizations, especially of the Fintech and InsuranceTech 
subsectors. The assets accommodated in the marketplace have been offered by var- 
ious INFINITECH members, developers and data scientists and can be leveraged 
internally, increasing collaboration and knowledge transfer among INFINITECH, 
or externally by developers, service providers, tech companies and organization, all 
while promoting the results of the Project and the impact made to the Finance and 
the Insurance sectors. [D8.3] 

INFINITECH provides a completely integrated environment enabling the uti- 
lization of big data and AI techniques in the finance and insurance sectors. The 
latter has become feasible through a set of technologies that enable exploitation 
of various datasets (obtained from different sources), optimized data manage- 
ment for these datasets (e.g., across diverse data stores), analytics with innovative 
algorithms covering a wide set of scenarios in the finance and insurance sec- 
tors, as well as use of tailored sandboxes on the underlying infrastructure layer 
for the execution of the aforementioned algorithms. Additionally, the develop- 
ment progress of INFINITECH pilots and provided sandboxes and testbeds have 
increased the INFINITECH ecosystem’s resources, ranging from ML models 
and AI algorithms to IoT applications, Blockchain and a variety of ready-to-use 
solutions. [D8.3] 

The INFINITECH solution goes beyond the utilization of analytics on spe- 
cific datasets for a few pilots/use cases, by aiming at a generalized approach that 
will facilitate the exploitation of various analytics algorithms (provided both by 
INFINITECH researchers/partners and by 3rd party data analysts) on top of dif- 
ferent datasets. To this end, the analytics algorithms need to be made available, to 
be described in terms of functionality, parameters, and offerings, to be accompa- 
nied with datasets that can be used by interested parties in order to validate their 
applicability and performance and to be offered as ready-to-be-executed solutions 
(e.g., containerized) in order to increase their utilization. [D8.3] 

All of the above are representative functionalities of the INFINITECH market 
platform, which is at an operational stage (accessible at https://marketplace.infinit 
ech-h2020.eu/) with various updates and functionalities introduced after the initial 
version. The marketplace holds and offers solutions for realizing big data and AI 
techniques in the finance and insurance sectors. [D8.3] 

Based on the above, the INFINITECH's multi-sided market platform aims at 
being one of the projects main ambassadors to the big data and AI communities. 
It is a single, public and hybrid system with many different APIs, covering all the 
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different required perspectives of the platform. The various APIs developed related 
to users, descriptions and assets are described in this document. [D8.3] 

The market platform offers big data and AI solutions, as well as IoT and 
Blockchain solutions, and VDIH Services. Thus, the INFINITECH market plat- 
form is a four-perspective, unified environment being able to store several types of 
assets (e.g., algorithms, descriptions of algorithms, evaluation and validation results, 
datasets, experimentation outcomes, etc.) in any format. [D8.3] 

The INFINITECH marketplace is already deployed and available at https://ma 
rketplace.infinitech-h2020.eu/. It is currently populated with various assets and 
offerings which include: [D8.3] 


* Algorithms 

e Datasets 

* Frameworks & mechanisms 
e Scientific studies/tutorials 
e Videos/webinars/lectures 

* Experimentation results 

e VDIH services 

* End-to-end solutions 

* Docker containers 

* Combined solutions 


The assets above have been provided by an extended audience, which includes 
INFINITECH Project members, developers, data scientists, VDIH services 
providers, as well as other authorized third-party users. 

The available offerings could be leveraged by a variety of end-users, with 
increased potential and emphasis being given to Fin Tech and InsuranceTech sub- 
sectors. [D8.3] 

With the progress of INFINITECH Project and included pilots development, 
new scalable and adaptable assets are introduced on a regular basis and are made 
available to the end-users, which can then be deployed and executed based on 
their needs. Moreover, additional external stakeholders, organizations and service 
providers are also encouraged to accommodate their assets in the INFINITEH mar- 
ketplace. [D8.3] The marketplace platform provides several functionalities that are 
mapped to different layers. In more detail, the back-end includes three layers (i.e. 
Assets Storage Layer, Assets Management Layer, and Interaction Layer), while the 
front-end includes one layer (i.e. Presentation Layer). The four layers of the mar- 
ketplace along the primary functionalities are depicted in the Figure 2.2 below: 
[D8.3] 


* The Assets Storage Layer (part of the back-end) is the layer in which the 
platforms offered assets are stored (See Figure 2.3). 
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Figure 2.2. Snapshot from the “Assets” page 


on INFINITECH’s Marketplace web page. 
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Figure 2.3. Market platform's layers and main functionalities. 


* The Assets Management Layer (part of the back-end) delivers all the needed 


principles and techniques for the management of the Marketplaces assets. 


* The Interaction Layer (part of the back-end) supports the communication 
between the market platform and its users (i.e. human users, and machine 
users), by providing discrete APIs for exploiting each different type of asset. 

* The Presentation Layer(part of the front-end) provides the User Interface 


towards the different types of users that are willing to use the platform. 


The market platform is structured around two core components, the back-end 
and the front-end. This approach contributes towards the platform's enhancements 
in terms of functionality as well as provides additional information and capabilities. 
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In that direction, the end-users are able to interact with the market platform 
through the front-end (through the presentation layer) that utilizes a user-friendly 
UI, while other additional services (e.g. from 3rd parties) can be implemented 
directly with the back-end (through the interaction layer). [D8.3] 

The back-end, which contains structured information and the assets offered by 
the market platform, is considered the core of the marketplace and its functional- 
ities. The front-end component is a user-friendly UI which presents to the users 
the offered content (the assets and their information), allowing them to inter- 
act with the platform in an easy and effective way. This section provides a short 
overview of the core components of the marketplace, describing some of their key 
features. [D8.3] 


2.21 Back-end 


The back-end is the main component of the marketplace. It consists of three dif- 
ferent layers and implements the main functionalities for the assets management. 
The three levels are briefly described below. [D8.3] 

The Assets Storage Layer is responsible for storing the assets that will be offered 
by the market platform. An essential component of this layer is the database that can 
store files in any format as well as additional information about the files provided. 
In this context, the type of database that is used is a document-oriented NoSQL 
database, which stores both JSON-like documents (the format of the descriptions 
files that are analyzed in the Assets Management Layer) and binary files, using 
extended specifications (e.g. file system). [D8.3] 

The Assets Management Layer is responsible for the entire life cycle of the assets 
within the platform and offers all the principles and techniques for their manage- 
ment. Specifically, this layer handles the assets from the moment they are added 
to the platform through the APIs and then stored in the database (Assets Storage 
Layer), until they are to be deleted for any purpose from the platform. Through 
this layer, the market platform supports the CRUD operations and searching func- 
tionality, which are triggered by the corresponding APIs of the back-end (Assets 
Interaction Layer). The back-end is a REST API and receives different HTTP 
requests in order to perform an operation/ trigger a functionality. Moreover, there 
are mandatory description files for all available assets that contain metadata about 
the described asset (in JSON format). These description files are mandatory to 
make the assets searchable and retrievable by the end-users of the marketpalce. 
[D8.3] 

The last layer, the Assets Interaction Layer, is responsible for supporting the 
communication between the market platform and its end-users. It implements the 
interfaces (APIs) of the back-end that will handle the back-end’s operations. As 
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described before, these APIs receive HTTP requests that trigger the CRUD opera- 
tions for both assets and description files. [D8.3] 


2.2.2 Front-end 


The front-end is the fourth layer of the market platform. It is a web-based server that 
presents the offered assets to the users, with a friendly UI. The front-end converts 
all interfaces of the back-end (REST API) into user friendly interfaces and pro- 
vides automated forms and processes that make it easier for users to interact with 
the back-end and benefit from its stored assets. Therefore, it acts as an intermedi- 
ate among the marketplace users and the back-end, sending the respective HTTP 
requests to the latter and presents its responses. [D8.3] 

In short, the front-end allows users to register and log-in to the marketplace 
(user-based platform), upload their offered assets by filling out appropriate forms 
whose fields will be the content of the description files of the assets; search for assets 
according to various fields (title, asset’s type, fields of use, provider, other metadata, 
etc.) that can be further filtered or even sorted by the number of views or the date 
they were uploaded to the marketplace, etc. Also, there is a page that presents in 
detail the information of the assets, and through this page, the users can retrieve 
the real assets, the files (See Figure 2.4). [D8.3] 

There are certain updates and improvements to the INFINITECH Market- 
place frontend. The structure initially defined was reorganized to accommodate 
the new content and facilitate the navigation within the INFINITECH Market- 
place. New pages were created, and some existing ones were improved. This update 
also brought new functionalities to the INFINITECH Marketplace users. 

The structure of the INFINITECH Marketplace was updated, the main update 
was in the VDIH page, previously named INFINITECH Academy. The name was 
changed to match the focus of the INFINITECH Marketplace that is “...estab- 
lish a market platform that will provide access to the project’s solutions, along 
with a Virtualized Digital Innovation Hub (VDIH) that will support innovators 
(FinTech/InsuranceTech) in their BigData/AI/IoT endeavors”. In the Figure 2.5 it 
is possible to see the current structure of the INFINITECH Marketplace. 

The diversity among VDIH resourcesled to the organization ofthe page into 
Training Activities and Innovation Services. While Training Activities page includes 
courses, webinars and workshops, the Innovation Services page include the acceler- 
ator programmes. For each type of VDIH content was created a new page, which 
brings together all the resources present on the platform related to that content. 

The updates arent restricted to the structure of the INFINITECH Marketplace, 
it was updated pages to support new functionalities and new pages was created to 
present new content. 
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Figure 2.4. Snapshot of the INFINITECH marketplace home web page. 
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Figure 2.5. INFINITECH marketplace structure. 


INFINITECH Marketplace Resources 
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Figure 2.6. INFINITECH marketplace resources. 


The statistics show the type and the number of resources available on the plat- 
form. As it can be seen in the Figure 2.6 above, presently the INFINITECH 
Marketplace contain assets, courses, workshops and accelerator programmes, cor- 
responding to a total of 192 resources. 

The INFINITECH Assets section provides access to the assets page and the 
INFINITECH VDIH section provides access to Training Activities and Innova- 
tion Services. Both sections were visually updated, making it easier to access the 
resources. 

The last section of the Homepage is dedicated to new functionalities and has 
been updated to highlight social login (Sign in) and forms (Add New Information). 
The forms allow the INFINITECH users to provide information to the platform. 
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INFINITECH Marketplace VDIH Resources 


WAccelerator Programmes W Courses Workshops — Webinars 


Figure 2.7. INFINITECH marketplace VDIH resources. 


This new feature will be explained in the New Functionalities of Marketplace 
section. 


VDIH [D8.5] 
The VDIH name has been changed to match the focus of the INFINITECH 


Marketplace. The page also was organized into Training Activities and Innova- 
tion Services, while the Training Activities include courses, workshops and webi- 
nars, the Innovation Services includes acceleration programs. Accelerator programs 
are services to support and guide startups and SMEs on the creation of innova- 
tion through experts advisory, co-working space, education and skills development, 
among others. 

As presented in Figure 2.7 above, the INFINITECH Marketplace currently has 
a total of 135 VDIH resources, which correspond to courses, workshops, webinar 
and accelerator programs. 

All VDIH content pages are similar and Figure 2.8 shows the Courses page as a 
demonstration of these pages. The VDIH pages provide all the resources available 
on the platform related to that type of content, as well as provide some features for 
the user: 


1. Filter by source: It allows to differentiate the source of the resources, if it 
belongs to INFINITECH partners or external entities, through the options: 
ALL, INFINITECH Course (in the case of the demonstration) or External. 

2. Red Flag with the INFINITECH symbol: Related with the filter, it allows 
to differentiate the source of the resources, highlighting the resources which 
belongs to INFINITECH partners. 
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Figure 2.8. Courses page demonstrating features. 
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3. Add new resource: In this case, appear “Add new course”, the button gives 
access to the course form. Each VDIH as its own form that allows the 
INFINITECH users to provide information. This feature will be explained 
in the New Functionalities of Marketplace section. 


Training Activities 

The INFINITECH Marketplace is a network with relevant resources about IoT, 
Blockchain, BigData and AI for finance/insurance. Training Activities is a new 
page of this network with a variety of courses, workshops and webinars that 
offer INFINITECH users the opportunity to improve their skills, knowledge or 
expertise. 


Courses 


The Courses page list all the courses available on the INFINITECH Marketplace. 
Each course has its own page, with information that characterizes it, such a brief 
description, who provided the course, difficult level, duration, cost, platform where 
the course will be available, among others. In addition to, the user also can know 
more about the course through the website present on the page. 

Figures 2.9, 2.10 and 2.11 shows all courses available on the INFINITECH 
Marketplace so far, a total of 71 VDIH resources. 


Workshops 


The workshops page list all the workshops present on the INFINITECH Market- 
place. Each workshop has its own page, with information that characterizes it, such 
a brief description, duration, who are the speakers, presentation files, videos and 
more like it can be viewed in Figure 2.12. 


Webinars 


The Webinars page brings together all the webinars available on the INFINITECH 
Marketplace. Each webinar has its own page, with some information that charac- 
terizes it, such as a brief description, date, organizer, as well as additional files and 
videos (See Figure 2.13). 


Innovation Services 


Innovation Services page was added to the INFINITECH Marketplace network 
and that’s where we can find accelerator programs information. The Innovation 
Services intends to be a page where the user can find content to get support to a self- 
determined and sustainable approach to digital pioneering and help to accelerate 
businesses and guide it into the future to achieve better results. 
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Courses 
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Figure 2.9. Courses available on the INFINITECH marketplace. 


Accelerator Programs 


The Accelerator Programs page brings together all the accelerator programs existent 


on the INFINITECH Marketplace. Each accelerator program has its own page, 


with some information that characterizes it, such as the services that provide and 


who are the beneficiaries of these services, as well the methodology, sector, among 


others. In addition to, the user can know more about the service through the website 


available on the page. 
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Figure 2.10. Courses available on the INFINITECH marketplace. 
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Figure 2.11. Courses available on the INFINITECH marketplace. 
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INFINITECH Stakeholders Workshop Series: "BigData and Artificial Intelligence for Portfolio 
Risk Assessment" 


INFINITECH Stakeholders Workshops Series: "Artificial Intelligence and Big Data analytics 
applied to Personalised, Usage Based and Configurable Insurance Products” 


INFINITECH Stakeholders Workshops Series: “Blockchain Applications for Digital Finance" 


INFINITECH Stakeholders Workshops Series: “Risk Profiling and Portfolio Optimization for 
broader Use Cases” 


Figure 2.12. Workshops on the INFINITECH marketplace. 


Webinars Link 
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Webinars Link 
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Figure 2.13. Webinars on the INFINITECH marketplace. 


To create community around the INFINITECH Marketplace and enrich it with 
new information, the following functionalities were added to allow the users inter- 


act with the platform: 


* Social Login: The registration process was simplified and facilitated. 
* Add new information: Any INFINITECH user can provide information. 


Social Login 


One of the objectives of the INFINITECH is to create a digital finance ecosystem 
of innovation, with IoT, Blockchain, BigData and AI solutions and services. For 
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this purpose, the Social Login was implemented in the INFINITECH Marketplace 
to connect the stakeholders and expand the INFINITECH community being an 
entry point for the community to access and provide information. 

The Social login offers a simplified, quick, and easy registration, helping the 
users to sign-up on a third-party platform using their existing login information 
from social media networks, like Google, LinkedIn, and GitHub (Figure 2.15). 


Add New Information 


In order to enrich the digital finance ecosystem of innovation, it was created forms 
to give the users the opportunity to share their solutions and services on the 
INFINITECH Marketplace, because it is important that information continues 
to evolve and increase (See Figure 2.14). 

Anyone can be part of this community, by registering, and as a INFINITECH 
user it is possible to provide information to the INFINITECH Marketplace. The 
users have different forms available, depending on the content they want to add 
(assets, courses, workshops, webinars, and accelerator programs). 

Through the Homepage the users can access all the forms available in a single 
place (Figure 2.16) or go through the respective content page. 


Accelerator Programmes Link 

Distributed Ledgers Research Centre (DLAC) link 
Gravity Ventures Incubator link 
Wayra link 
Visa Innovation Program link 
University of Valencia Science Park DIH link 
TVT innovation — Pile Mer Méditerranée link 
The LHoFT link 
Sunrise Valley Digital innovation Hub (Sv DIH) link 
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Start-up Nation Central link 
Start-up garage link 
5martCityTech link 
Seedrocket link 
Santaka Artificial Intelligence DIH link 
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Plug and Play link 
Nyuko a.s.b.l. link 
Nowel-T link 


Figure 2.14. Continued 
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Figure 2.14. Accelerator programs. 
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Figure 2.15. Social login. 
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Figure 2.16. Select form. 


Marketplace Usage Scenarios 


The INFINITECH Marketplace is a good source of information, where can be 
found solutions and services about IoT, AI, Blockchain and BigData in context of 
finance/insurance. It is very important to enrich the information available and for 
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this the users will be capable of providing information, but also to take advantage 
of it. 

So, this means, that can be seen from 2 perspectives, the user as an information 
provider and the user as information consumer. Both perspectives can be explored 
through the following scenarios that represent examples of the INFINITECH 
Marketplace usage: 


e Scenario 1: A user who want add information to the INFINITECH Market- 
place. 

€ Scenario 2: A user who want to know more about workshops in the finance 
domain. 


Scenario 1: Upload a VDIH 


Scenario 1 represents a user who want provide information to the INFINITECH 
Marketplace. For example, to upload a blockchain Course. For this, the user needs 
to follow a few steps: 


e Step A: Sign in. 


o Step Al: If the user isn't registered, they will have to register first and then 
they will be able to login. 


* Step B: Access to the form, which can be done through: 


o Step B1: Homepage, which gives access to all available forms. 

o Step B2: VDIH page, which is organized in “Training” and "Innovation 
Services". The user must go through Training to access the Courses page 
and select "Add New Course". 


* Step C: Fill in the Course form. The first fields of the form refer to the 
provider (Your Information) and the other to the Course (Course Informa- 
tion). Some fields are mandatory, those that are identified with a*. 

* Step D: When submitting the information, it is moderated by the consor- 
tium. 


* Step E: The Course is published on the INFINITECH Marketplace. 


Figure 2.17 show the aspect of a Course which was added to the INFINITECH 
Marketplace through Course form. 


Scenario 2: Consult a VDIH 


Scenario 2 represents a user that want to know more about blockchain Workshops 
in finance domain. The INFINITECH Marketplace has a VDIH section and in 
this section, the user can consult Workshops, but also other types of content are 
available about IoT, Blockchain, BigData, AI for finance and insurance (Courses, 
Webinars and Accelerator Programmed). 
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Introduction to Hyperledger Blockchain 
Ri Technologies 
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Figure 2.17. Introduction to hyperledger blockchain course. 


On the Workshops page the user can get an idea of the information present on 
the platform through the names of the workshops and start by selecting one that 
seems to be of interest to him. On the respective workshop page, the user can find 
out more about the workshop, exploring the description and the agenda, among 
other additional information, and only follow the parts that fit their interests. The 
user can manage their time and later consult the presentation files that are available 
online. 

To obtain more detailed information the user can contact the consortium and 
receive all the information that needs. Figure 2.18 represents an example of Work- 
shops available at the INFINITECH Marketplace and we can see the sections it 
was mentioned before. 
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Figure 2.18. INFINITECH stakeholders workshops series: "Blockchain applications for 
digital finance". 
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2.3 Baseline Technologies 


The back-end is the core base of the market platform and it has been developed 
using a variety of technologies/tools. First of all, its components are containerized 
in Docker images that, among others, offer more efficient management and main- 
tenance, enabling continuous updates and integration. Python is used as the pro- 
gramming language that along with the Flask framework, which is a Web Server 
Gateway Interface (WSGI) developed in Python, implements RESTful APIs to 
handle the respective HTTP requests. 

The offered assets are stored in a MongoDB No-SQL database that is used in 
combination with the file system of the hosting operating system for storing and 
retrieving large files/objects, of any format. The initial implementation contained 
the GridFS specification which was eventually deprecated as for really large files 
there was an extra delay on the back-end’s responses for assets retrieval. Moreover, 
Gunicorn, a Python WSGI HTTP Server for UNIX, is utilized with NGINX, an 
open-source high-performance HTTP web server and reverse proxy, since Flask 
is not optimum for production mode, and thus, both tools will extend the Flask 
framework in order to enable access to multiple users at the same time. 

In terms of the front-end, it has been implemented using various web technolo- 
gies (HTML, CSS, etc.) and it is functional using PHP and JavaScript technologies. 
It also exploits WordPress and various plugins of it, in order to manage the content 
that is presented. [D8.4] 


2.4 Interfaces (APIs) 


This section describes the REST API endpoints that are introduced in the final 
version of the back-end. As already mentioned, the back-end is a REST API that 
receives HTTP requests to trigger its designed and implemented functionalities. 
These APIs are categorized into 3 main groups, namely: (1) APIs related to Users, 
(2) APIs related to Descriptions and (3) APIs related to Assets. 


2.4] APIs Related to Users 


This group of APIs offers functionalities intended for the management of market- 
place users. The most important functionality is that of user registration, as it is 
necessary for the usage of the rest functionalities. For all users, except for their per- 
sonal information, there will be a unique username. The Table 2.1 presents the 
endpoints related to Users, as they are in the first version of the marketplace. 
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Table 2.1. APIs of the back-end related to Users. 
Action HTTP Method Endpoint 
Register a new user (Sign up) POST HOST}/accounts/users/registration 
Check the availability of a username GET HOST}/accounts/username/availability 
Authenticate a user (Login) POST HOST}/accounts/users/authentication 
Get user’s information GET HOST}/accounts/users/information/{username} 
Update user’s information PUT HOST}/accounts/users/information/{username} 
Change user’s password. POST HOST}/accounts/users/password/change 
Reset user’s password POST HOST}/accounts/users/password/reset 
Delete user’s account DELETE HOST}/accounts/users/delete/ {username} 
Get user's privacy data GET HOST}/accounts/users/data 
Erase user’s data DELETE HOST}/accounts/users/data 
Set user’s profile as public/private PUT HOST}/accounts/users/profile/accessibility 
Connect user's account to a social POST HOST}/accounts/users/sso/{sso_service}/connect 
service 
Disconnect user’s account from a POST HOST}/accounts/users/sso/{sso_service}/disconnect 


social service 


e {HOST} refers to the hosting server: the domain name and the port that the 


back-end runs. 


e {username} refers to "users unique username". 


© [sso service] refers to a supported SSO provider (i.e. Google, LinkedIn or 


GitHub). 


* Some of these actions require additional fields in the headers or even in the 
body of the HTTP request. Example of a required field is the API key that 
users use in order to validate themselves to the platform. 


A more detailed description of all the Interfaces listed in the previous table is 


presented at the Table 2.2 to 2.14, a table per Interface on the list respectively. 


Table 2.2. 
Title: 
Endpoint: 
HTTP Method: POST 
Description: 


Register a new user (Sign up) 


{HOST}/accounts/users/registration 


Interface: Register a new user (Sign up). 


From this endpoint, the registrations of the marketplace users are made. A POST request 


should be submitted and the next JSON schema must be in its body as raw data. It is 
noted that a) the email and the username must be unique and available b) the schema 
below should be exactly the same, whether there are values or not (empty strings ^") — the 
array “social” can be empty. 


(Continued) 
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Table 2.2. Continued 


"username": ".", "account": ("password": 
o"), 
"info se qi 
Msgakscyshe. inanes Wein. Iste riam qur 
Vemanna Wi Warsepretis th 
} 
} 
Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Headers: Key Value 
None None 
URL Parameters: Parameter Value 
None None 
Query Parameters: None 


Restrictions/Special Features: None 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl --request POST '{HOST}/accounts/users/registration' \ 
--header 'Content-Type: application/json' 
--data-raw '{ 


"username: accounts {password r Mp 
Dialis a wi 
Druesstamemen: cc aiat arme: tur 
"email": ".", "about": ™." 


After a successful registration, the following JSON document is stored in the 
database: 


m 


“username”: ".", // user's unique username 

Maccount: +f 
"password": ".", // user's password (hashed) 
"password protected": "..", // parameter that determines 
whether the account is password protected or not 
(values 1 or 0) 
Sconnecbions4 mu google a rc M object that 
determines if the account is connected to any of the 
supported SSO services (e.g. Google, Github, LinkedIn, 
etc.) 
"role": "user", // user's role 
(user or admin) "verified": "."^, 
// value = "1" if user is 
verified, 
otherwise, it has a verification code to use it for user's 
email/account verification 
"registration datetime”: ".." // user's registration date 
"public profile": "." // parameter that determines whether 
the account can be publicly displayed or not (values 
TL yere 

l 

“info”: {// info provided during user’s registration 
“first name”: Nan’, “last nane™: ToT email”: n.7; 
aTe oah h Na i 
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Table 2.3. 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 
Headers: 


URL Parameters: 
Query Parameters: 


Restrictions/Special Features: 


Successful Response: 
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Interface: Check the availability of a username. 


Check the availability of a username 
{HOST}/accounts/username/availability 
GET 


This endpoint is used in order to check the availability of a username 
during the users’ registration. A GET request should be made and the key 
"x-username" must be included in the headers of the request. 


None 

Key Value 

x-username The username whose availability will be checked. 
Parameter Value 

None None 

None 

None 


Availability status in JSON Object. 


The following is an example of the request in cURL: 


curl --request GET '(HOST)/accounts/username/availability' -header `x- 
username: <value>’ 
Table 2.4. Interface: Authenticate a user (Login). 
Title: Authenticate a user (Login) 
Endpoint: {HOST}/accounts/users/authentication 
HTTP Method: POST 
Description: Through this endpoint, the users are authenticated in order to log in to their 
account. A POST request should be made and the next JSON schema, 
containing users’ credentials, must be in the body of the request as raw data. 
It is noted that users can log in either with their email or with their 
username. Finally, single sign-on (SSO) schemes are supported (only 
through the front-end) for registration and login, using accounts from social 
media (i.e. Google, LinkedIn, Github). 
(username a email: Mnn pas WT Ce M.” he 
A successful response will return the next JSON schema that contains the 
API key in the key “token”: 
(9 nere a 
"successtulUtokenuUttsapuigkeyt 
Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Headers: Key Value 
None None 
URL Parameters: Parameter Value 
None None 
Query Parameters: None 
Restrictions/Special Features: None 


Successful Response: 


JSON Object with a successful message and the API key. 


The following is an example of the request in CURL: 


curi --request POST 


'{HOST}/accounts/users/authentication' \ - 


-header 'Content-Type: 
--data-raw 


'( "username": 


application/json' \ 


"a", "email": "s", 


"password": 


"uno 
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Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 
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Table 2.5. Interface: Get user’s information. 


Get user's information 
{HOST}/accounts/users/information/ {username} 
GET 


This endpoint is used in order to retrieve information about a user. Some of 
the information that is retrieved, are first and last name, about section, 
email, etc (below is illustrated an example of retrieved user’s information). 
A GET request should be made and the user's {username} is required at the 
end of the endpoint. Moreover, this endpoint is restricted and thus, 
requesters’ API keys must be included in the headers of the request. 


Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Key Value 

API KEY  Requesters API key. 

Parameter Value 

username The username of the user whose information will be retrieved. 


None 


Restrictions/Special Features: The administrators and the accounts’ owners are able to 


retrieve all users’ information, while users that retrieve 
information of other users retrieve only public 
information. 


Successful Response: JSON Object with a user’s information. 


The following is an example of the request in CURL: 


curl --request GET '{HOST}/accounts/users/information/{username}' -- 
header SAPT KEY: <value>' 


Example of a successful response 


{“" status”: “successful”, “result”: { 
“acconnt: {registration datetine™: S.r, “role”: “user, 
“yerified”r: “1i”, 


sconnections”: {“google": w^ a } 

), 

Bento about r Sn emaa uc ITSE Rane o ere ETE dE 

"username": 
vut y) 

Table 2.6. Interface: Update user's information 

Title: Update user's information 
Endpoint: {HOST}/accounts/users/information/{username} 
HTTP Method: PUT 
Description: This endpoint handles requests for updating users’ information. A PUT request should be 


made and the next JSON schema (it is flexible and thus may contain fewer fields — but 
without new fields), containing users’ new information, must be in its body as raw data. 


iz roe inci e TLODL Viler meatg Wi, Meloyemie pg UU 
Remadiduc t 


(Continued) 


Interfaces (APIs) 37 


Table 2.6. Continued 


Moreover, this endpoint is restricted and thus, users’ API keys must be 
included in the headers of the request. It should be noted that only the 
accounts’ owners and the administrators are able to update the information 


of a user. 
Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Headers: Key Value 


API KEY  Requesters API key. 
URL Parameters: Parameter Value 

username The username of the user whose information will be updated. 
Query Parameters: None 


Restrictions/Special Features: Only the accounts’ owners and the administrators are able to update the 
information of a user. 


Successful Response: A successful response will return the next JSON Object that contains a new 
API key in the key "token": 
{^ status“; “successful”, “message”: "The 
information of the user with 
username ‘{username}’ has been updated.”, “token”: 


“<api_key>”} 


The following is an example of the request in CURL: 


curi ==reguest EUT 
'(HOST)/accounts/users/information/(username]' \ - 

sheader “APT KEY: «value-' —-hecader ‘Content-Type: 
application/json' X 

~ dara Tan tts nano ne T o Tilase Naneo aT abu 
momar lee m EU Es E 


Table 2.7. Interface: Change user's password. 


Tite: Change user's password 

Endpoint: {HOST}/accounts/users/password/change 

HTTP Method: POST 

Description: This endpoint is used when the users want to change their accounts’ password. A 


POST request should be made and the next JSON schema, containing users’ new and 
old credentials, must be in its body as raw data. Also, this endpoint is restricted and 
thus, users’ API keys must be included in the headers of the request. It should be 
noted that this action is only available to accounts’ owners. 


(woldspasswordi vc new passwords Ue, 
"confirm new password": ".."} 
Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Headers: Key Value 
API KEY  Requester's API key. 
URL Parameters: Parameter Value 
None None 


Query Parameters: None 


(Continued) 
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Table 2.7. Continued 
Restrictions/Special Features: Only available to accounts’ owners. The new password must not be the same 
with previous password. 
Successful Response: JSON Object with a successful message. 
The following is an example of the request in CURL: 


curl --request POST '{HOST}/accounts/users/password/change' \ 
--header "API KEY: €value^' --header "Content-Type: application/json' X 
-—data-raw '{ "old password": t, “new password.) T.T 


ways 


"confirm new password": 


Table 2.8. Interface: Reset user's password. 


Title: Reset user's password 

Endpoint: {HOST}/accounts/users/password/reset 

HTTP Method: POST 

Description: This endpoint handles the process of changing users’ passwords after a 


password reset request. It works in combination with the front-end’s features 
which, at first, sends an email to the users with a password reset link that 
redirects to a form from which the users can set their new password. After 
this process, the front-end sends a POST request to the back-end with the 
new credentials of the user (username and password). 


WexuSernamceg M MID ISSWOT C M 
"password reset code": "." } 
Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Headers: Key Value 
None None 
URL Parameters: Parameter Value 
None None 
Query Parameters: None 
Restrictions/Special Features: None 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 

curl -request POST ‘{HOST}/accounts/users/password/reset’ \ 

--header ‘Content-Type: application/json’ \ 

-data caw “{ "username "20" 7, “pagseword" + "..") “password reset code": 


"o" oyy 


Table 2.9. Interface: Delete user's account. 


Title: Delete user's account 

Endpoint: (HOST]/accounts/users/delete/ [username] 

HTTP Method: DELETE 

Description: In order to delete an account, this endpoint should be used, making a DELETE request and 


providing requester's password in its body, as raw data (JSON format). The endpoint must 
contain the username of the user whose account will be deleted at the end of the URL. 
(pals sworn o NA} 


(Continued) 
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Body Data: 
Headers: 


URL Parameters: 


Query Parameters: 


Restrictions/Special Features: 


Successful Response: 
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Table 2.9. Continued 


The endpoint is restricted and thus, requester’s API key must be included in 
the headers of the request. This action is available to accounts’ owners and to 
administrators who are able to delete users from the marketplace. If the 
action is made by an administrator, the value of the field “password” in the 
body should be the password of administrator. 

An important note is that the deletion of an account has as result the 
deletion of all user’s data, offered descriptions and assets. 


Raw (JSON) Data — as the above schema 
(Content-Type: application/json) 

Key Value 

API KEY  Requesters API key. 
Parameter Value 


username The username of the user 
whose account will be deleted. 


None 


Only the accounts’ owners and the administrators are able to delete an 
account/user. 


JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl --request DELETE 


'(HOST) /accounts/users/delete/(username]' \ -- 
header TAPTOKEY: <value>" — header 'Content— 


Type: application/json' 


\ 


——data—raw Uf “passworda "r Tm jo 


Table 2.10. Interface: Get user’s privacy data. 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 
Headers: 


URL Parameters: 
Query Parameters: 


Restrictions/Special Features: 


Successful Response: 


Get user’s privacy data 
{HOST}/accounts/users/data 
GET 


This endpoint, which is available only to accounts’ owners, returns all the 
personalized data of the requester, returning users’ information and various 
metadata. The endpoint is restricted and thus, the API key of the requester 
must be included in the headers of the request. 


None 

Key Value 

API KEY  Requesters API key. 
Parameter Value 

None None 

None 

Available only to accounts’ owners. 


A JSON Object with users’ data. 


The following is an example of the request in cURL: 


curl --request GET '[HOST)/accounts/users/data' --header ‘API KEY: 


<value>' 
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Table 2.11. Interface: Erase user’s data. 


Title: Erase user’s data 

Endpoint: {HOST}/accounts/users/data 

HTTP Method: DELETE 

Description: This endpoint, which is available only to accounts’ owners, erases all the 


personalized data of the requester. The endpoint is restricted and thus, the 
API key of the requester must be included in the headers of the request. 


Body Data: None 
Headers: Key Value 
API KEY  Requester’s API key. 
URL Parameters: Parameter Value 
None None 
Query Parameters: None 


Restrictions/Special Features: Available only to accounts’ owners. 


Successful Response: JSON Object with a successful message. 

The following is an example of the request in CURL: 
curl --request DELETE '{HOST}/accounts/users/data' --header 'API KEY: 
«value»' 


Table 2.12. Interface: Set user's profile as public/private. 


Title: Set user's profile as public/private 

Endpoint: {HOST}/accounts/users/profile/accessibility 

HTTP Method: PUT 

Description: This endpoint, which is available only to accounts’ owners, turns the user's 


account as publicly accessible or private, depending on the preferences of the 
user. The indication of the account’s accessibility is included in the headers 
of the request under the "accessibility" key. Moreover, the endpoint is 
restricted and thus, the API key of the requester must be included in the 
headers of the request. 


Body Data: None 
Headers: Key Value 


API KEY Requester's API key. 


accessibility ^ Determines whether the account will be accessible or not. 
Acceptable values: 


9 “1”: public account 
e «Qus 
0”: private account 


URL Parameters: Parameter Value 
None None 
Query Parameters: None 


Restrictions/Special Features: Available only to accounts’ owners. 
Successful Response: JSON Object with a successful message. 
The following is an example of the request in CURL: 


curl --request DELETE '{HOST}/accounts/users/profile/accessibility' \ 
--header 'API KEY: «value»' --header 'accessibility: <value>' 
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Table 2.13. Interface: Connect user's account to a social service. 


Title: Connect user’s account to a social service 

Endpoint: {HOST}/accounts/users/sso/{sso_service}/connect 

HTTP Method: POST 

Description: This endpoint, which is available only to accounts’ owners, connects an 


existing marketplace account to a social service through the single sign-on 
(SSO) functionality. This action is not a sign up/ registration action and 
thus, the action of the registration can be done only through the “User’s 
registarion" interface (Table 2.2). The current supported SSO services are 
the following: Google, LinkedIn and GitHub. 
The body of the request must contain the unique ID of the user in the 
corresponding {sso_service} with which the connection will be made. 

B CES 
Moreover, the endpoint is restricted and thus, the API key of the requester 
must be included in the headers of the request. 


Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json) 
Headers: Key Value 


API KEY Requesters API key. 
URL Parameters: Parameter Value 
sso service The name of the SSO service. Acceptable values: “google”, 
“linkedin”, “github”. 
Query Parameters: None 
Restrictions/Special Features: Available only to accounts’ owners. 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl —-request POST 
'{HOST}/accounts/users/sso/{sso_service}/connect' \ -- 
header 'API KEY: <value>' --header "Content-Type: 
application/json' \ 

--data-raw '{ "id": "n" )' 


Table 2.14. Interface: Disconnect user's account from a social service. 
Title: Disconnect user's account from a social service 
Endpoint: {HOST}/accounts/users/sso/{sso_service}/disconnect 
HTTP Method: POST 


Description: This endpoint, which is available only to accounts’ owners, disconnects a social 
account/SSO service from the requester’s account. This action is possible only if the users 
have set a password for their accounts. If their accounts are not password protected, then 
the users must set a password through the “reset password” interface (Table 2.8), before 
disconnecting the services from their accounts. 

The current supported SSO services are the following: Google, LinkedIn and GitHub. 
Moreover, the endpoint is restricted and thus, the API key of the requester must be 
included in the headers of the request. 


Body Data: None 
Headers: Key Value 


API KEY Requester’s API key. 


(Continued) 
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Table 2.14. Continued 
URL Parameters: Parameter Value 
sso service The name of the SSO service. Acceptable values: “google”, 
“linkedin”, “github”. 
Query Parameters: None 
Restrictions/Special Features: Available only to accounts’ owners. 
Successful Response: JSON Object with a successful message. 
The following is an example of the request in CURL: 
curl --request POST 
'(HOST)/accounts/users/sso/(sso service]/disconnect' \ -- 
hedera TY Mesas 


2.4.2 APIs Related to Descriptions 


This group of APIs offers functionalities intended for the management of the 
descriptions. They support all CRUD operations as well as the search functionality. 
Special emphasis was placed on the APIs for the descriptions retrieval, extending 
them so as to get the latest descriptions or even random descriptions either from a 
specific collection (database collection) or from all collections at once, using a key- 
word named “all”. The collections of the database, as described in deliberable D8.2 
"Market Platform and VDIH Specifications — II”, vary as well as the offered mar- 
ketplace' types of assets. The current list of the collections can be found at the end 
of the following Table 2.15, which presents the endpoints related to Descriptions. 
The details of each of Interfaces can be found at Table 2.16 to 2.37 respectively. 


Table 2.15. APIs of the back-end related to Descriptions. 


Action HTTP Method Endpoint 

Get a list with all descriptions GET {HOST}/descriptions/all 

Get a list with all descriptions GET {HOST}/descriptions/ {collection} 

from a specific collection 

Get a specific description GET {HOST}/descriptions/all/{description_id} 


(using keyword “all”) 
Get a specific description GET {HOST}/descriptions/ {collection}/{description_id} 
(using descriptions 
"collection") 


Get the latest descriptions GET {HOST}/descriptions/all/latest 


from all collections 


Get the latest descriptions GET {HOST}/descriptions/{collection}/latest 
from a specific collection 
Get random descriptions from GET {HOST}/descriptions/all/random 


all collections 


(Continued) 
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HTTP Method 


Endpoint 
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Get random descriptions 
from a specific collection 


ci 


pload/Create a new 
escription with random ID 


a. 


Upload/Create a new 
description with given ID 


[mi 


pdate a specific description 
using keyword "all") 


Update a specific description 
using description’s 
collection") 


« 


Delete a specific description 
using keyword “all”) 


Delete a specific description 


using description’s 
collection”) 


Delete all descriptions 


Delete all descriptions from a 
specific collection 


Get a list with all descriptions 
that need permission 
(administrators’ action) 


Get a list with all descriptions 
from a specific collection that 
need permission 
(administrators’ action) 


Approve or reject a 
description that needs 
permission, using keyword 

all” (administrators’ action) 
Approve or reject a 
description that needs 
permission, using description’s 

collection” (administrators 
action) 


Approve or reject all 
descriptions that need 
permission, using keyword 
"all" (administrators action) 


Approve or reject all 
descriptions that need 
permission under a specific 
collection, using a "collection" 
value (administrators action) 


GET 


POST 


POST 


PUT 


PUT 


DELETE 


DELETE 


DELETE 
DELETE 


GET 


GET 


POST 


POST 


POST 


POST 


HOST}/descriptions/all/all 


{HOST}/descriptions/permit/all/{description_id} 


HOST}/descriptions/{collection}/random 
HOST}/descriptions/{collection} 
HOST}/descriptions/{collection}/{given_id} 
HOST}/descriptions/all/{description_id} 


HOST}/descriptions/{collection}/{description_id} 


HOST}/descriptions/all/{description_id} 


HOST}/descriptions/{collection}/{description_id} 


HOST}/descriptions/{collection}/all 


HOST}/descriptions/permit/all 


HOST}/descriptions/permit/ {collection} 


{HOST}/descriptions/permit/{collection}/ {description_id} 


{HOST}/descriptions/permit/all/all 


{HOST}/descriptions/permit/{collection}/all 
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e {HOST} refers to the hosting server: the domain name and the port that the 
back-end runs. 

© [description id] refers to the ID of a specific description. 

e (given id] is used in “upload description" action, providing new-description’s 
ID. 

* Asa {collection} can be one of the following values derived from the current 
types of offered assets: 

l'algorithms", "notebooks", "ml-models', "datasets", "apis, “webinars”, “appli- 
cations", 

“blockchain’, "third-party-tools", “whitepapers”, "how-to-videos', "accelerator- 
programmes", 

"innovation-support-services", “workshops”, "courses ^, “containers”, “uncatego- 
rized”, “other’} 

* Some of these actions require additional fields in the headers or even in the 
body of the HTTP request. Example of a required field is the API key that 
users use in order to validate themselves to the platform. 

Table 2.16. Interface: Get a list with all descriptions. 
Title: Get a list with all descriptions 
Endpoint: {HOST}/descriptions/all 
HTTP Method: GET 
Description: A GET request to this endpoint will result in the retrieval of the stored descriptions from 
all collections. It uses the keyword “all” instead of a specific collection, which makes the 
platform to retrieve descriptions from all collections at once. The descriptions that return 
from this request are in a short schema (short description), meaning that the retrieved 
information is limited. An example of a description’s description in short schema is the 
following JSON schema: 
{"collection":; "datasets", "id"; "datasets vlLZ2WaogNIFe ", 
vinto": { “titles "Example titie", 
"keywords": [ "information"], "owner": "Vasilis Koukos 
“short desc! “This is an example", "type": dataset 
}, metadata": ("provider": "vkoukos", 
"updateDate". ".", "iploadDate": "a", "views": 35 
) 
} 
This endpoint can get query parameters to search for descriptions that meet certain 
conditions. As a query parameter can be any pair of key-value, while additional search 
operators can be used for more advanced and enhanced search. More details about 
searching can be found in Section 3.2.3. Also, this endpoint offers some standard query 
parameters, as described below (Query Parameters). 
Body Data: None 
Headers: Key Value 


None None 


(Continued) 
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URL Parameters: 


Query Parameters: 


Restrictions/Special Features: 


Successful Response: 


Table 2.16. Continued 


Parameter 


None 


Key 
sortBy 


itemsPerPage 


page 


Any key to search 
(refer to Section 3.2.3) 


None 


Value 
None 
Value 
Optional] Sorts the descriptions by a field — 


the default is the “newest” key. The value 
should be one of the following: 


“newest”: sort by date in descending order. 
“oldest”: sort by date in ascending order. 


“views-asc”: sort by the number of views in 
ascending order. 


"views-desc": sort by the number of views in 
descending order. 


“title”: sort by title in ascending order. 


Optional] Returns the results separated in 
pages (arrays) of N items. The number N is 
specified by the value of this key. The value N 
must be an integer number greater or equal 
to 1. If the key is not used or has a 
non-accepted value, the results are returned on 
a single page. 

[Optional] This key can only be used if the 
"itemsPerPage" key is also used. If it is used, it 
returns only the specified (by key’s value) page 
instead of all pages created using the key 
"itemsPerPage". The value must be an integer 
number greater or equal to 1. The default 
value is 0, which means that all pages will be 
returned. 


Any value to search (refer to Section 3.2.3). 
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AJSON Object with the results (all descriptions from all collections). If the 
query parameter "itemsPerPage" is used, then the results contain the total 


number of the pages. 


The following is an example of the request in CURL: 


Feur 
seu 
PTOL 


T Cugi 


' 


--request GET 
--request GET ' 
--request GET ' 


--request GET 


' (HOST) /descriptions/a 


ae Yep 


--request GET 


' (HOST) /descriptions/a 


ae) (ensured 


--request GET 


' (HOST) /descriptions/a 
ge={value}' 


Exampl 


ag euri 


HOST)/descriptions/all' 
HOST}/descriptions/all?sortBy={value}' 
HOST}/descriptions/all?itemsPerPage={value}' 


1?itemsPerPage={value} &page={value}' 


1?sortBy={value}&itemsPerPage={value}' 


1?sortBy={value} &itemsPerPage={value}&pa 


e of retrieving the 10 most viewed descriptions: 


--request GET ' 


HOST)/descriptions/all?sortBy-views- 
desc&itemsPerPage=10&page=1"' 
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Table 2.17. 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 
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Interface: Get a list with all descriptions from a specific collection. 


Get a list with all descriptions from a specific collection 
{HOST}/descriptions/ {collection} 

GET 

This GET request is similar to the above request. The only difference 
between these two actions is that this request retrieves descriptions from a 


single — specific collection (instead of using keyword “all”). For more details, 
refer to the previous endpoint. 


None 
Key Value 
None None 


Parameter Value 


collection Valid values: 
{“algorithms”, “notebooks”, “ml-models”, “datasets”, “apis”, 
“webinars”, "applications", “blockchain’, “thirdparty-tools”, 
“whitepapers”, “how-to-videos”, “acceleratorprogrammes”, 
"innovation-support-services", “workshops”, 
courses ", “containers”, “uncategorized”, “other”} 


As in the above request. 


Restrictions/Special Features: None 


Successful Response: AJSON Object with the results (all descriptions in a specific collection). 


The following is an example of the request in CURL: 


+ curl --request GET '{HOST}/descriptions/{collection}' 


+ curl --request GET \ 


'(HOST)/descriptions/(collection)?sortBy-(value)&itemsPerPage-(value)&p 


age={value}' 


Table 2.18. Interface: Get a specific description (using keyword “all”). 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Get a specific description (using keyword “all”) 
{HOST}/descriptions/all/{description_id} 
GET 


With this GET request, the users are able to retrieve a specific description. The retrieval 
of a specific description is possible using its unique identification code (ID) that is known 
when uploading it. Also, the retrieval of a specific description can be done using both 
keyword “all” and the name of the collection that the description has been stored (next 
interface). This is feasible because the back-end ensures that the IDs are unique regardless 
of the collection a description has been stored. 

Moreover, the retrieval of a specific description requires an API key to be retrieved in its 
“full schema". If requester’s API key is missing, then the endpoint returns the short 
schema of the description’s description. Example of a full schema of a description is in the 
endpoint that handles the uploading of a description (Table 2.24). 


None 


(Continued) 
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Headers: 
URL Parameters: 


Query Parameters: 


Restrictions/Special Features: 


Successful Response: 
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Table 2.18. Continued 


Key Value 
API_KEY Requester’s API key. 
Parameter Value 


description id The ID of the description that will be retrieved. 
None 


The full schema is available only to authenticated (and verified) users, 
otherwise, the short schema is available to all. 


AJSON Object with the description in the results. 


The following is an example of the request in CURL: 


4 curl ==request GET 'IHOSTI/descriptions/all/idescription 10)" 
4 curl ==request GET 'IHOST)I/descriptions/all/rdesceiption id)" == 
nesder VAI REY -value=? 


Table 2.19. Interface: Get a specific description (using description’s “collection”). 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 


Restrictions/Special Features: 


Successful Response: 


Get a specific description (using description's "collection") 
{HOST}/descriptions/ {collection}/{description_id} 
GET 


This GET request is similar to the above request, with the difference that it 
uses description’s collection for the retrieval of a description (instead of 
using keyword “all”). The value of the {collection} must be the collection 
in which the description has been stored. More information about the 
endpoint can be found in the previous endpoint. 


None 

Key Value 

API_KEY Requester’s API key. 
Parameter Value 


description id The ID of the description that will be retrieved. 


collection Valid values: 
{“algorithms”, “notebooks”, “ml-models 


« 


» « 


, datasets”, “apis”, 
webinars”, “applications”, *blockchain", “third-party-tools”, 
whitepapers”, “how-to-videos”, "accelerator-programmes", 


« 


innovation-support-services", “workshops”, “courses”, 


» « » « 


containers”, “uncategorized”, “other”} 
None 


The full schema is available only to authenticated (and verified) users, 
otherwise, the short schema is available to all. 


AJSON Object with the description in the results. 


The following is an example of the request in CURL: 


+ curl --request GET '{HOST}/descriptions/{collection}/{description id)' 
+ curl request GET '{HOST}/descriptions/ {collection} /{description id)' 


\ 


--header 'API KEY: <value>' 


48 


Innovative Technologies for the Financial Sector 


Table 2.20. Interface: Get the latest descriptions from all collections. 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 


Restrictions/Special Features: 


Successful Response: 


Get the latest descriptions from all collections 
{HOST}/descriptions/all/latest 
GET 


This request is used to retrieve the most recent uploaded descriptions sorted 
based on the date that they have been uploaded, with the most recent being 
on the top of the list. This is a GET request, using the keyword “all” and 
returns the K latest descriptions from all collections. The value of K can be 
specified through the query parameter “max” (the default value is 20). The 
descriptions are returned in their short schema. 

Finally, the endpoint “Get a list with all descriptions” can return the same 
results as the current, if the example at the end of the current endpoint will 
be followed. 


None 
Key Value 
None None 


Parameter Value 


None None 

Key Value 

max Integer value greater than 0 — Default: 20 
None 


A JSON Object with the results (latest descriptions from all collections). 


The following is an example of the request in CURL: 


+ curl --request GET '{HOST}/descriptions/all/latest' 
+ curl --request GET '{HOST}/descriptions/all/latest?max=5' 


Example of similar response by the endpoint “Get a list with all 


descriptions”: 
t ourl --reguest GET 


'{HOST}/descriptions/all?sortBy=newest&itemsPerPage=20&page=1' 


Table 2.21. Interface: Get the latest descriptions from a specific collection. 


Title: Get the latest descriptions from a specific collection 


Endpoint: {HOST}/descriptions/{collection}/latest 


HTTP Method: GET 


Description: This request is similar to the above GET request. It uses the value of a specific collection 


and not the keyword “all”, which results to return sorted the K most recent descriptions 
of the provided collection. The value of K can be specified through the query parameter 
“max” (the default value is 20). The descriptions are returned in their short schema. 
Finally, the endpoint “Get a list with all descriptions from a specific collection” can return 


the same results as the current, if the example at the end of the current endpoint will be 


followed. 
Body Data: None 


(Continued) 
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Table 2.21. Continued 


Headers: Key Key 
None None 
URL Parameters: Parameter Value 


collection Valid values: 
{“algorithms”, "notebooks", “ml-models”, "datasets", “apis”, 
“webinars”, “applications”, “blockchain’, “third-party-tools”, 
whitepapers”, “how-to-videos”, "accelerator-programmes", 


“innovation-support-services”, “workshops”, "courses ", 
containers”, “uncategorized”, “other”} 


Query Parameters: Key Value 
max Integer value greater than 0 — Default: 20 


Restrictions/Special Features: None 
Successful Response: AJSON Object with the results (latest descriptions of a collection). 
The following is an example of the request in CURL: 


+ curl --request GET '{HOST}/descriptions/{collection}/latest' 
+ curl --request GET '{HOST}/descriptions/{collection}/latest?max=5' 


Example of similar response by the endpoint “Get a list with all 


collection”: 
descriptions from a specific 


+ curl --request GET 
"{HOST}/descriptions/ {collection} ?sortBy=newestéitemsPerPage=20 épage 
Sai 


Table 2.22. Interface: Get random descriptions from all collections. 


Title: Get random descriptions from all collections 

Endpoint: {HOST}/descriptions/all/random 

HTTP Method: GET 

Description: This endpoint returns a number of random descriptions from all collections 


(uses keyword "all"). It is useful in order to suggest and promote different 
descriptions each time. It is also used in the home page of the 
INFINITECH market platform, where random descriptions are displayed. 
Through the query parameter “max” it can return K descriptions, where K 
can be specified by the users (the default value is 4). The descriptions are 
returned in their short schema. 


Body Data: None 
Headers: Key Value 
None None 
URL Parameters: Parameter Value 
None None 
Query Parameters: Key Value 
max Integer value greater than 0 — Default: 20 


Restrictions/Special Features: None 


(Continued) 
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Table 2.22. Continued 


Successful Response: A JSON Object with the results (random descriptions from all collections). 


The following is an example of the request in CURL: 


+ curl --request GET '{HOST}/descriptions/all/random' 


+"curl --request GET '{HOST}/descriptions/all/random?max=5' 


Table 2.23. Interface: Get random descriptions from a specific collection. 


Title: Get random descriptions from a specific collection 

Endpoint: (HOST]/descriptions/(collection]/random 

HTTP Method: GET 

Description: This endpoint is similar to the above endpoint. Instead of keyword "all" it 


uses a specific collection and thus it returns a number of K random 
descriptions of the provided specific collection. The value of K can be 


specified through the query parameter "max" (the default value is 4). The 
descriptions are returned in their short schema. 


Body Data: None 
Headers: Key Value 
None None 
URL Parameters: Parameter Value 
collection Valid values: 
{“algorithms”, “notebooks”, “ml-models”, “datasets”, “apis”, 
“webinars”, “applications”, “blockchain”, “third-party-tools”, 
“whitepapers”, “how-to-videos”, “accelerator-programmes”, 
“innovation-support-services”, “workshops”, “courses ”, 
“containers”, “uncategorized”, “other”} 
Query Parameters: Key Value 
max Integer value greater than 0 — Default: 20 
Restrictions/Special Features: None 
Successful Response: AJSON Object with the results (random descriptions from a specific 
collection). 
The following is an example of the request in CURL: 
+ curl --request GET '{HOST}/descriptions/{collection}/random' 
+ curl --request GET '{HOST}/descriptions/{collection}/random?max=5' 


Table 2.24. Interface: Upload/Create a new description with random ID. 


Title: 
Endpoint: 
HTTP Method: 


Upload/Create a new description with random ID 
{HOST}/descriptions/ {collection} 
POST 


(Continued) 
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Description: 


Body Data: 


Headers: 


URL Parameters: 


Table 2.24. Continued 


Through this POST request, the users can upload their descriptions. It requires 

users/ providers to specify (at the end of the endpoint) the collection in which the 
description will be stored. Also, in the headers of the request, the API key of the provider 
is necessary to be included because the endpoint is available only to authenticated (and 
verified) users. An important note is that all the new descriptions uploaded to the 
Marketplace must be approved by an administrator before they can be made available to 
other users. Moreover, the administrators can upload a description on behalf of other 
users, adding the key “x-provider” in the headers of the request. The body of the request 
must contain the contents of the description as raw data in JSON format. The schema of 
the descriptions’ content varies, and it is flexible to be extended. The JSON schema 
below, presents some required fields of a description: 


{ 
"title": "<title of the asset>", 
description. 
provided asset»", 


"description of the 


"type": "«type of the asset (same as the 
collection value)>", 
"owner ": "<organization / author / 
Xe 
"contact": "«provider's name and email>", 
"availability": "«reflects the users who 
can see the description: publtec 
infinitech / specific Work Packages or 
Pilots / Other>", "keywords": ["<keyword 
qe ucommenmussc a providens 
comments»" } 
Except of these fields, there are also some optional fields, like “subtype”, "deliveryDate", 
"resources", "fieldOfUse", “license” and others. The “resources” field is used in cases that 
the descriptions contain assets that are stored in other repositories (e.g. GitHub, Gitlab, 
etc.). 
The front-end has appropriate forms that build automatically these JSON schemas. 


Raw (JSON) Data — as the above schema (Content-Type: application/json). 
It should be noted that the descriptions can also be uploaded from binary files that 
contain the above JSON schema (a curl example can be found below). 


Key Value 
API KEY Requester's API key. 


x-provider [Optional & only for administrators] The username of the provider user 
in case that the description is uploaded by an administrator and not by 
the provider. 


Parameter Value 


collection Valid values: 
{“algorithms”, “notebooks”, “ml-models”, “datasets”, “apis”, 
“webinars”, “applications”, “blockchain’, “thirdparty-tools”, 
whitepapers”, “how-to-videos”, "acceleratorprogrammes", 
"innovation-support-services", “workshops”, 
courses”, “containers”, “uncategorized”, “other”} 


(Continued) 
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Table 2.24. Continued 


Query Parameters: None 
Restrictions/Special Features: Available to all authenticated (and verified) users. The 
administrators can upload a description on behalf of other 
users. 
Successful Response: JSON Object with the new description’s ID in its content. 
The following is an example of the request in CURL: 
curl -request POST ‘{HOST}/descriptions/{collection}’ N 
--header "APT KEY: svalue-' -header “Content-Type: 
application/json' \ --data-raw ‘{ 
"title": "<title of the asset>", 
"description": "<description of the provided asset>", 
"type": "<type of the asset (same as the collection value)>", 
"owner ": "<organization / author / etc.>", 
"contact": "<provider’s name and email>", 
"availability": "<reflects the users who can 
see the description: public / infinitech 
/ specific Work Packages or Pilots / Other»", 
"keywords": ["<keyword 1>", .. ], "comments": 
"Xprovider's comments>" 


} 


f 


Example of uploading a description through binary 
data/file: curl -request POST 
‘{HOST}/descriptions/{collection}’ --header 'API KEY: 
«value»' \ 

--header ‘Content-Type: application/json' --data-binary 
‘@<path to json file>’ 


Below are some examples of the stored descriptions’ schema: 


Example 1 — Newly uploaded and updated description with no assets or resources 


"id": "algorithms _P8fYOAX67HkK-8fpelT1B-KuR4-Zsck”, 
"collection": "datasets", 
mE 
"title": "Example.", "type": "algorithms", 
"comments": "Private comment.”, 
"contact": "Vasilis Koukos, email", 
"description": "This is an example of description.", 
"keywords": [ "testing", "documentation"]"], 
"owner": "UPRC", "availability": "public", 
// optional fields 
"subtype": "ML algorithm", "fieldsOfUse": ["machine learning", 
“pig data”]; 
Slicense"s Mr; "any other field provided by the userni e coy 


ia 
"metadata": ("approved": 1, //0 for pending / 1 for approved 
"last updated by": “vkoukos”, "md5": "«md5 hash of the 
description's data>”, "provider": "vkoukos", 
“updateDate”: "2022-11-15 13:50:48.4202", “uploadDate”: 
*2022-10-13 10): 34:28 (42027, 


"version": 2, //the version of the description - 
increases when updating "views": 34), 
Nassets: Ie //list with the uploaded assets for this description 
"resources": [] //list with the external resources / links 


added to this description ) 
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Example 2 — Description with an uploaded asset 


"id": "algorithms P8fYOAX67HkK-8fpelTlB-KuR4-Zsck", 
Ap Beslan en ecd “assets”: [{ 
“verified”: 0, //0 for pending / 1 for approved 
“downloads”: 10, //number of downloads of the asset 
"filename": "kmeans.py", "id": "80F7MjRTIxvb- 


nu 


7gIKRAjv-IJ3p-b3vL", //asset's ID mec S77, 
Neate m OP Ie, 5catsebated: TTE SO tE2022] 0/5/9229) 


GMT", 
"version": 1 //the version of the file - increases 


when updating bl, 
“resources”: [] 


Example 3 — Retrieved description (full schema) 


"Ld": “algorithms PSEYOAXGTHKK-SfpelTIB-KuRA-Z5Ck"^, 
»ocllection < datasets, 
MINEO o. if 
“eitle”: Examples, “typet: Nalgorithms 4, 
“comments”: “Private comment.”, 
Mecontaet: MWasilis Koukos,; email”, 
"description": "This is an example of description.", 
"keywords": [ "testing", "documentation"]"], 
TONNEL s: Nile, “availaBitlicy e public"; MESIEWGens od Sa? 
"subtype": "ML algorithm", "fieldsOfUse": 
["machine learning", "big data"] ), 
"metadata": ("approved": 1, //0 for pending / 1 for approved 
"last updated by": "vkoukos", "md5": "«md5 hash of the 
description's data»", _ “provider”: “vkoukosi’”, 
"updateDate": "2022-11-15 13:50:48.420Z", “uploadDate”: 


2022 SIGNE O SA 29 1207/27 
"version": 2, //the version of the description - 


increases when updating “views”: 34}, 
vagsets s aid //list with the uploaded assets for this description 
“verified”: 0, “downloads”: 10, “filename”: “kmeans.py”, 


"id": "80F7MjRTIxvb-7qIKRAjv-IJ3p-b3vl", "version": 1, 
"md5": “a”, "size": "7.92 KB”, "updateDate": "Thu, 13 Oct 
2022 10:34:28 GMT" 


)], 


"resources": [] //list with the external resources / 
links added to this description } 


Table 2.25. Interface: Upload/Create a new description with given ID. 


Title: Upload/Create a new description with given ID 
Endpoint: {HOST}/descriptions/{collection}/{given_id} 
HTTP Method: POST 


(Continued) 
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Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 
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Table 2.25. Continued 


This endpoint is similar to the above POST request. The only difference is 
that through the current endpoint, the users are able to specify the ID of the 
new description, providing it at the end of the endpoint {given_id}. 
Currently, this endpoint can be used only by the administrators. 

Raw (JSON) Data — as the schema of the previous endpoint (Content-Type: 
application/json). 

It should be noted that the descriptions can also be uploaded from binary 
files that contain the JSON schema of the previous endpoint (example in 
curl can be found at the end of the interface). 


Key Value 
API KEY Requester's API key. 


x-provider [Optional & only for administrators] The username of the 
provider user in case that the description is uploaded by an 


administrator and not by the provider. 
Parameter Value 


collection Valid values: 
('algorithms", “notebooks”, “ml-models”, “datasets”, “apis”, 
“webinars”, “applications”, *blockchain", “thirdparty-tools”, 
“whitepapers”, “how-to-videos”, “acceleratorprogrammes”, 
“innovation-support-services’, “workshops”, 
courses ”, “containers”, “uncategorized”, “other”} 


given_id The ID to be given to the new description. 


None 


Restrictions/Special Features: Available only to administrators. The administrators are able to upload a 


description on behalf of other users. 


Successful Response: JSON Object with the new description’s ID in its content. 


The following is an example of the request in CURL: 


curl -request POST 


' (HOST) /descriptions/(collection]/(given id)' 
\ --header 'API KEY: «value»' -header 


‘Content-Type: 
` 
{ 


application/json' NV --data-raw 


"title": "<title of the asset>", 
"description": "<description of the provided asset>", 
"type": "<type of the asset (same as the collection value)>", 
"owner ": "<organization / author / etc.>", 
"contact": "<provider’s name and email>", 
"availability": "«reflects the users who can 
see the description: public / 


infinitech / specific Work Packages or Pilots / 
Other>", "keywords": ["<keyword 1>", .. ], 
"comments": "<provider’s comments>" 


} 


L 


Example of uploading a description through binary data/file: 
curl -request POST ‘{HOST}/descriptions/{collection}/{given_id}' 
--header 'API KEY: «value»' \ 

--header ‘Content-Type: application/json' --data-binary 
(«path to json file»' 
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Table 2.26. Interface: Update a specific description (using keyword “all”). 


Tide: Update a specific description (using keyword “all”) 

Endpoint: {HOST}/descriptions/all/{description_id} 

HTTP Method: PUT 

Description: With this endpoint, the providers of the descriptions and the administrators, 


are able to update the content of the (/their) descriptions. It is possible to 
update the whole description or only some fields. It requires the ID of the 
description to be at the end of the endpoint and the users’ API key in the 
headers of the PUT request. As in the create action, the description should be 
provided as raw data in JSON format. It should be noted that this endpoint 
uses the keyword “all” (the descriptions are already stored in the marketplace, 
thus the platform knows the collections in which they have been stored). 
Below is the standard schema for a description: 


{ 


Inde ctc of Chs closes, 

"description ";: description of the provided 
asset>", 

"type"; "<type of the asset {same as the 
collection value)>", 

"owner ": "<organization / author / etc.>", 

"contact": "<provider’s name and email>", 

"availability": "<reflects the users who can 
see the description: public / infinitech 


/ specific Work Packages or Pilots / Other>", 
"keywords": ["<keyword 1>", .. ], "comments": 
"<provider’s comments>" 


} 


Body Data: Raw (JSON) Data — as the above or similar schema (Content-Type: 
application/json). 
The descriptions can also be updated from binary files that contain the 
above JSON schema (curl example can be found at the end of the interface). 


Headers: Key Value 
API_KEY Requester’s API key. 
URL Parameters: Parameter Value 


description id The ID of the description that will be updated. 
Query Parameters: None 


Restrictions/Special Features: Available only for the providers/creators of the 
descriptions and for the administrators who can update 
any description. 


Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 
'curl --request PUT '(HOST)/descriptions/all/(description id)' \ 
zsheaden TAPI KEI: «value= -header Content-Type: 
application/json' \ --data-raw '{ 

"existing or new field": "<value>", 
p 
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Table 2.27. Interface: Update a specific description (using description’s “collection”). 


Title: Update a specific description (using description's "collection") 

Endpoint: {HOST}/descriptions/ {collection}/{description_id} 

HTTP Method: PUT 

Description: This PUT request is similar to the previous request. The only difference is 


that instead of using keyword “all” it uses a specific collection/the collection 
in which the description that will be updated has been stored during its 
creation/initial upload. The endpoint is restricted and available only to 
descriptions’ providers/creators and to administrators who can update any 
description. Thus, the API keys of the requesters must be included in the 
headers of the request. More information about the endpoint can be found 
on the previous endpoint. 


Body Data: Raw (JSON) Data — as the schema of the previous endpoint (Content-Type: 
application/json). 
It should be noted that the descriptions can also be updated from binary files 
that contain the JSON schema of the previous endpoint (example in curl 
can be found at the end of the interface). 


Headers: Key Value 

API KEY Requester's API key. 
URL Parameters: Parameter Value 

collection Valid values: 


» « 


f'algorithms", “notebooks”, “ml-models”, “datasets”, “apis”, 
“webinars”, "applications", "blockchain", "thirdparty-tools", 


» « 


“whitepapers”, “how-to-videos 


^, “acceleratorprogrammes”, 


"innovation-support-services", “workshops”, 
courses ”, “containers”, “uncategorized”, “other”} 


description id The ID of the description that will be updated. 
Query Parameters: None 


Restrictions/Special Features: Available only for the providers/creators of the descriptions and for the 
administrators who can update any description. 


Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 
curl --request PUT 
'{HOST}/descriptions/{collection}/{description_id}' 
\ --header 'API KEY: <value>' --header 'Content- 
Type: application/json' \ --data-raw '{ 
“existing or new Lield": a evaluesu w 
p 


Table 2.28. Interface: Delete a specific description (using keyword "all"). 


Title: Delete a specific description (using keyword “all”) 
Endpoint: {HOST}/descriptions/all/{description_id} 
HTTP Method: DELETE 


(Continued) 
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Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 


Table 2.28. Continued 


A DELETE request to this endpoint has as a result the deletion of a specific 
description, using its ID. The endpoint is restricted and available only to 
descriptions’ providers/creators and to administrators who can delete any 
description. Thus, users’ API keys must be included in the headers of the 
request. 

It should be noted that this endpoint uses the keyword “all” instead of 
description’s collection (the descriptions are already stored in the 
Marketplace, thus the platform knows the collections in which have been 


stored). 

None 

Key Value 

API_KEY Requester’s API key. 
Parameter Value 


description_id The ID of the description that will be deleted. 


None 


Restrictions/Special Features: Available only for the providers/creators of the descriptions 


Successful Response: 


and for the administrators who can delete any description. 


JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl --request DELETE '(HOST)/descriptions/all/(description id]' -- 


header 'API KEY: 


<value>' 


Table 2.29. Interface: Delete a specific description (using description’s “collection”). 


Title: 
Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Headers: 


URL Parameters: 


Delete a specific description (using description’s “collection”) 
{HOST}/descriptions/ {collection}/{description_id} 
DELETE 


This DELETE request is similar to the above request, with the difference that, instead 
of keyword “all”, it uses a specific collection/the collection in which the description 
that will be deleted has been stored during its creation/initial upload. The endpoint is 
restricted and available only to descriptions’ providers/creators and to administrators 
who can delete any description. Thus, the API keys of the requesters must be 
included in the headers of the request. More information about the endpoint can be 
found on the previous endpoint. 


None 

Key Value 

API_KEY Requester’s API key. 
Parameter Value 

collection Valid values: 


» &« » « 


{“algorithms”, "notebooks", “ml-models”, “datasets”, “apis”, 
“webinars”, “applications”, “blockchain’, “third-party-tools”, 
“whitepapers”, “how-to-videos”, “accelerator-programmes”, 
“innovation-support-services”, “workshops”, "courses ", 
containers”, “uncategorized”, “other”} 


description id The ID of the description that will be deleted. 


(Continued) 
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Table 2.29. Continued 


Query Parameters: None 


Restrictions/Special Features: Available only for the providers/creators of the descriptions and for the 
administrators who can delete any description. 


Successful Response: JSON Object with a successful message. 
The following is an example of the request in CURL: 


curl --request DELETE 
'{HOST}/descriptions/{collection}/{description_id}' \ 
--header 'API_KEY: <value>' 


Table 2.30. Interface: Delete all descriptions. 


Title: Delete all descriptions 

Endpoint: {HOST}/descriptions/all/all 

HTTP Method: DELETE 

Description: This endpoint is available only to the administrators, who can delete all the 


existing descriptions from all the collections (the keyword “all” is used 
instead of a specific collection). The endpoint is restricted and thus, users’ 
API keys must be included in the headers of the request. For security 
reasons, the requesters should provide their password in the body of their 
request, as raw data (JSON schema): 


Sy 


{ “password”: 


Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json). 
Headers: Key Value 
API KEY  Requesters API key. 
URL Parameters: Parameter Value 
None None 
Query Parameters: None 


Restrictions/Special Features: Available only for the administrators of the Marketplace. 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl --request DELETE '(HOST)/descriptions/all/all' \ 
--header 'API KEY: <value>' --header ‘Content-Type: application/json’ 
X 


x cdatacraweEe d pdsswonddg e 


Table 2.31. Interface: Delete all descriptions from a specific collection. 


Title: Delete all descriptions from a specific collection 
Endpoint: {HOST}/descriptions/{collection}/all 
HTTP Method: DELETE 
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Table 2.31. Continued 


Description: This DELETE request is similar to the above request and it is only available 
to administrators who can delete all the descriptions from a specific 
collection. It is a restricted endpoint and thus, users’ API keys must be 
included in the headers of the request. For security reasons, the requesters 
should provide their password in the body of their request, as raw data 
(JSON schema): 


ck iekerstisnironel a Dae Y 


Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json). 
Headers: Key Value 


API KEY  Requesters API key. 
URL Parameters: Parameter Value 


collection Valid values: 
{“algorithms”, "notebooks", “ml-models”, "datasets", “apis”, 
“webinars”, “applications”, “blockchain’, “third-party-tools”, 
whitepapers”, “how-to-videos”, "accelerator-programmes", 


“innovation-support-services”, “workshops”, “courses ”, 
containers”, “uncategorized”, “other”} 


Query Parameters: None 
Restrictions/Special Features: Available only for the administrators of the Marketplace. 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl --request DELETE '{HOST}/descriptions/{collection}/all' \ 


--header 'API KEY: <value>' --header ‘Content-Type: application/json’ 
N 


——obineiiceny “al Servicer n v jy 


Table 2.32. Interface: Get a list with all descriptions that need permission (administrators’ 
action). 


Title: Get a list with all descriptions that need permission (administrators’ action) 

Endpoint: {HOST}/descriptions/permit/all 

HTTP Method: GET 

Description: This endpoint returns the descriptions from all the collections (since the keyword “all” is 


used) that need permission before they become available to the Marketplace’s users. 

A description needs permission either when it is uploaded or after it has been updated by 
the users. Moreover, the endpoint is only available to administrators and thus, the API 
key of a requester is required in the headers of the request. Finally, the endpoint offers 
some standard query parameters that specify the format of the results and are described 
below (Query Parameters). 


Body Data: None 
Headers: Key Value 


API KEY  Requesters API key. 
URL Parameters: Parameter Value 


None None 


(Continued) 
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Table 2.32. Continued 


Key Value 
sortBy [Optional] Sorts the descriptions by a field — the default is 


the “newest” key. The value should be one of the 
following: 


“newest”: sort by date in descending order. 
“oldest”: sort by date in ascending order. 


“title”: sort by title in ascending order. 


itemsPerPage [Optional] Returns the results separated in pages (arrays) 
of N items. The number N is specified by the value of this 
key. 
The value N must be an integer number greater or equal 
to 1. 


If the key is not used or has a non-accepted value, the 
results are returned on a single page. 


Page [Optional] This key can only be used if the 
"itemsPerPage" key is also used. If it is used, it returns the 
specified (by key's value) page instead of all pages created 
using the key "itemsPerPage". The value must be an 
integer greater or equal to 1. The default value is 0, 
meaning that all pages will be returned. 


Restrictions/Special Features: Available only to the administrators. 


Successful Response: JSON Object with the descriptions (from all collections) that need 


permission in its content. 


The following is an example of the request in CURL: 
+ curl -request GET ‘{HOST}/descriptions/permit/all’ --header 'API KEY: 


<value>' 


Le 


++ tt 


curl 
curl 
curl 
curl 


curl 


-request GET °..?sortBy={value}’ 

-request GET "'.?itemsPerPage-(value]' .. 

-request GET ".?itemsPerPage-(value)&page-(value]"' 

-request GET ".?sortBy-(value)&itemsPerPage-(value]" 

-request GET °...?sortBy={value}éitemsPerPage={value} &page={value}’ 


Table 2.33. Interface: Get a list with all descriptions from a specific collection that need 
permission (administrators' action). 


Title: 


Endpoint: 
HTTP Method: 


Description: 


Body Data: 
Headers: 


Get a list with all descriptions from a specific collection that need permission 
(administrators action) 


{HOST}/descriptions/permit/{collection} 
GET 


This request is similar to the above request. The only difference between the two actions 
is that the current request retrieves the descriptions that need permission from a specific 
collection (uses a specific {collection} value instead of the keyword “all”). For more details, 
refer to the above endpoint. 


None 
Key Value 
API KEY  Requester's API key. 


(Continued) 
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Table 2.33. Continued 


URL Parameters: Parameter Value 


collection Valid values: 
{“algorithms”, "notebooks", “ml-models”, "datasets", “apis”, 
“webinars”, “applications”, "blockchain", “thirdparty-tools”, 
“whitepapers”, “how-to-videos”, "acceleratorprogrammes", 
"innovation-support-services", “workshops”, 
courses ”, “containers”, “uncategorized”, “other”} 


Query Parameters: As in the above request. 

Restrictions/Special Features: Available only to the administrators. 

Successful Response: JSON Object with the descriptions (from a specific collection) that need 
permission in its content. 

The following is an example of the request in CURL: 


+ curl -request GET ‘{HOST}/descriptions/permit/all’ --header 'API KEY: 
<value>' 


** The previous endpoint’s examples also appry to the current one, but 
using a {collection} value 


Table 2.34. Interface: Approve or reject a description that needs permission, using key- 
word “all” (administrators' action). 


Title: Approve or reject a description that needs permission, using keyword "all" 
(administrators action) 

Endpoint: {HOST}/descriptions/permit/all/{description_id} 

HTTP Method: POST 

Description: This endpoint is used by administrators to approve or reject a specific 


description (using its ID) that needs administrators’ permission. The 
endpoint is restricted and available only to administrators and thus, the 
requesters must provide their API key in the headers of the request. Also, this 
endpoint uses the keyword “all” and not the collection in which a specific 
description is stored, as the next endpoint does. An important parameter/key 
that must be included in the request's headers is the "xpermission" key that 
should contain the word "approve" for the description's approval, otherwise 
the word “disapprove” for its rejection. A rejection/disapproval of a 
description results to the deletion of the description and all its 


assets/contents. 
Body Data: None 
Headers: Key Value 
API KEY Requester's API key. 
x-permission Valid values: 
{“approve”, "disapprove"] 
URL Parameters: Parameter Value 


description id The ID of the description that will be 
approved or rejected. 


Query Parameters: None 
Restrictions/Special Features: Available only to the administrators. 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl -request POST ‘{HOST}/descriptions/permit/all/{description id)' \ 
--header 'API KEY: «value»' --header ‘x-permission: <value>’ 
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Table 2.35. Interface: Approve or reject a description that needs permission, using 
description’s “collection” (administrators’ action). 


Title: 


Endpoint: 
HTTP Method: 


Description: 


Body Data: 


Headers: 


URL Parameters: 


Query Parameters: 


Approve or reject a description that needs permission, using descriptions 
“collection” (administrators’ action) 
{HOST}/descriptions/permit/{collection}/{description_id} 

POST 

This request is similar to the above request. The only difference between the 


two actions is that the current request uses the value of the {collection} in 
which a specific descriptions is stored. For more details, refer to the above 


endpoint. 
None 
Key Value 
API_KEY Requester’s API key. 
x-permission Valid values: 
{“approve”, "disapprove"] 
Parameter Value 
collection Valid values: 


{“algorithms”, “notebooks”, “ml-models”, “datasets”, 
apis’, “webinars”, “applications”, “blockchain”, 
"thirdparty-tools", “whitepapers”, “how-to-videos”, 
"acceleratorprogrammes", "innovation-support-services", 


» « 


« » & » « : : » 
workshops ; courses , containers , uncategorized > 


"other"] 
description id The ID of the description that will be approved or rejected. 


None 


Restrictions/Special Features: Available only to the administrators. 


Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl -request POST 
' (HOST) /descriptions/permit/(collection]/(description id}’ \ 


--header 'API KEY: <value>' --header ‘x-permission: <value>’ 


Table 2.36. Interface: Approve or reject all descriptions that need permission, using key- 
word "all" (administrators' action). 


Title: 


Endpoint: 
HTTP Method: 


Description: 


Approve or reject all descriptions that need permission, using keyword “all” 
(administrators action) 


{HOST}/descriptions/permit/all/all 
POST 


This endpoint is used by the administrators to approve or reject all the stored descriptions 
(from all the collections, since keyword “all” is used) that need administrators’ permission. 
The endpoint is restricted and available only to administrators and thus, the requesters’ 
must provide their API keys in the headers of the request. An important parameter/key 
that must be included in the headers of the request is the "x-permission" key that should 
have as a value the word “approve” for the descriptions to be approved, otherwise the 
word “disapprove” to be rejected. A rejection/disapproval of the descriptions has as a 
result the deletion of the descriptions and all of their assets and contents. 


(Continued) 
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Table 2.36. Continued 


Body Data: None 
Headers: Key Value 


API KEY Requester's API key. 


x-permission Valid values: 
{“approve”, “disapprove”} 


URL Parameters: Parameter Value 
None None 
Query Parameters: None 


Restrictions/Special Features: Available only to the administrators. 
Successful Response: JSON Object with a successful message. 
The following is an example of the request in cURL: 


curl -request POST 

'(HOST) /descriptions/permit/all/all' \ -- 
header d APISKEY:Osvaluez'c--Hheader Vx 
permission: <value>’ 


Table 2.37. Interface: Approve or reject all descriptions that need permission under a 
specific collection, using a "collection" value (administrators' action). 


Title: Approve or reject all descriptions that need permission under a specific 
collection, using a "collection" value (administrators action) 

Endpoint: {HOST}/descriptions/permit/{collection}/all 

HTTP Method: POST 

Description: This request is similar to the above request. The only difference is that the 


administrators, using the current endpoint, are able to approve or reject all the 
descriptions of a specific {collection}. The endpoint is restricted and available only to 
administrators and thus, the requesters’ must provide their API keys in the headers of 
the request. An important parameter/key that must be included in the headers of the 
request is the “x-permission” key that should have as a value the word “approve” for 
the descriptions to be approved, otherwise the word “disapprove” to be rejected. 

A rejection/disapproval of the descriptions has as a result the deletion of the 
descriptions and all of their assets and contents. 


Body Data: None 
Headers: Key Value 


API KEY Requester's API key. 


x-permission Valid values: 
{“approve”, "disapprove"] 


URL Parameters: Parameter Value 


collection Valid values: 
{“algorithms”, "notebooks", “ml-models”, “datasets”, “apis”, 
“webinars”, "applications", "blockchain", “thirdparty-tools”, 
whitepapers”, “how-to-videos”, “acceleratorprogrammes”, 
"innovation-support-services", “workshops”, 


» « » « » « 


courses”, “containers”, “uncategorized”, “other”} 
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Table 2.37. Continued 


Query Parameters: None 
Restrictions/Special Features: Available only to the administrators. 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in CURL: 


curl -request POST ‘{HOST}/descriptions/permit/{collection}/all’ \ 
--header 'API KEY: <value>' --header ‘x-permission: <value>’ 


2.5 Search Functionality on Descriptions 


The search functionality is a vital requirement for most services in order to reduce 
the number of objects returned by a query. Thus, the back-end’s endpoints that 
retrieve multiple descriptions simultaneously, support some relative query filters. 
These filters enable the users of the marketplace to search for assets, based on various 
parameters from the content of the stored descriptions. 

Mote specifically, the interfaces of the back-end that return lists of assets, sup- 
port additional query parameters with any key-value pair. Query parameters are a 
defined set of parameters attached to the end of a URL and are used in order to 
help search specific content or actions based on the data being passed. In order to 
append query parameters to the end of a URL, a question mark “?” is added to the 
end of the URL, followed immediately by a pair of a key and a value, separated by 
an equal symbol “=”. Moreover, a URL can have multiple parameters, by adding 
an ampersand symbol *&" between each pair of key-value. 

In the context of the INFINITECH Marketplace and the description, the keys 
added to the URLs as query parameters must be valid, in the sense that they exist 
as fields in the descriptions and their search has a real value. Below are some valid 
syntaxes for advanced search with additional query parameters. The examples use 
the “Get a list with all descriptions” interface. 


Single attribute: 

'(HOST)/descriptions/all?Xattribute name»-«value»' 

More attributes: 

'(HOST)/descriptions/all?Xattribute 1»-«value»&«attribute 2»-«va 
lue»&..' 


Moreover, the Python programming language that is used by the back-end (as 
described in Section 3.1), enables access to nested fields of dictionary/JSON object 
using a dot *." between a key at the first level of the hierarchy and a key at the second 
level (this applies to all levels, up to the lowest level). Thus, the next example is also 
a valid schema of a query: 
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For attributes in lower hierarchical level: 


'(HOST)/descriptions/all?Xattribute level 1».«.».«attribute level 
_n>=<value>' 


To sum up, given the above syntaxes of a valid query, the following search exam- 
ple request in cURL, returns the descriptions that in their title contain the value 
“machine learning” and their type is “algorithms”: 


curl --request GET 


Uu 
'(HOST) /descriptions/all?title-machine$20learning&type-algorithm 


It should be noted that the value “%20” is the ASCII Encoding Reference of the 
space character. 

Except for these, the back-end supports advanced searching using some operators 
that extend the keys of the query parameters, using a dot “.” between the keys and 
the operators. Below are the supported operators along with a description for their 
usage. 

Below are some examples of using the previous operators. 


eq: ‘{HOST}/descriptions/all?metadata.provider.eq=vkoukos’ ne: 

'" (HOST) /descriptions/all?metadata.version.ne-1' gt: 

'" (HOST) /descriptions/all?metadata.views.gt-100' gte: 
‘{HOST}/descriptions/all?info.type.gte=datasets’ lt: 
‘{HOST}/descriptions/all?metadata.uploadDate.1t=2022-10-15’ lte: 
‘{HOST}/descriptions/all?metadata.views.1lte=20’ 

ins 
‘{HOST}/descriptions/all?info.title.in=machine,learning,algorithm’ 
nin: ‘{HOST}/descriptions/all?info. keywords .nin=economy, finance’ 


Furthermore, the back-end’s search mechanism uses a ranking system for the 
results. More specifically, for each description in the results, it maintains a score 
resulting from the points it receives for each search argument. 

In an equality search (using “=” symbol or “eq” operator) for a specific key, the 
points that a description receives can be one of the following: 


e 5: if the values are exactly equal (same) and case sensitive. 

e 4: if the values are equal (same) but not case sensitive. 

e 3: if the values are similar (e.g., the first value contains the second value but 
are not the same) and case sensitive. 

e 2: if the values are similar but not case sensitive. 

* 0: if the values do not match. 


The other operators just receive 1 point if the conditions match (i.e. the con- 
dition is “true”). The operator "in" uses the operator “eq” (or the symbol “=”) for 
each value in its “array” and thus, it has the same score system. Finally, the operator 
"nin" uses the operator "ne" for each value in its “array” (Table 2.38). 
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Table 2.38. Back-end’s search operators. 

Operator Usage Example 

eq Full title: equal <key>.eq= <value> 
This operator performs an equality search and has exactly the same 
use with the equality symbol “=”. It applies to both texts (strings) and 
numbers. 

ne Full title: not equal <key>.ne=<value> 
This operator performs a non-equality search. It applies to both texts 
(strings) and numbers. 

gt Full title: greater than «key .gt- «value 
This operator performs searching for a key with a value greater than 
the provided. It applies to both texts (strings) and numbers. 

gte Full title: greater than or equal «key .gte- <value> 
This operator performs searching for a key with a value greater than or 
equal to the provided. It applies to both texts (strings) and numbers. 

It Full title: less than «key .lt» «value» 
This operator performs searching for a key with a value less than to 
the provided. It applies to both texts (strings) and numbers. 

lte Full title: less than or equal <key> .lte=<value> 
This operator performs searching for a key with a value less than or 
equal to the provided. It applies to both texts (strings) and numbers. 

in Full title: in (equal to one of the values) <key>.in=<value_1>, 
This operator performs searching for a key with a value equal to one of  — <value_2> 
the provided values. The <value> may have multiple values separated 
by a comma “,”. It applies to both texts (strings) and numbers. 

nin Full title: not in (not equal to any of the values) <key>.nin=<value_1>, 


This operator performs searching for a key with a value not equal to 
any of the provided values. The <value> may have multiple values 
separated by a comma “,”. It applies to both texts (strings) and 
numbers. 


2.6 APIs Related to Assets 


<value_2> 


This group of APIs offers functionalities intended for the management of the assets. 


They support all CRUD operations for files which are stored in the marketplace. 


Table 2.39 presents the endpoints related to Assets as they are in the last version of 


the marketplace’s back-end. 


e {HOST} refers to the hosting server: the domain name and the port that the 


back-end runs. 


* {asset_id} refers to the ID of a specific asset. 
e {description_id} refers to the ID of the description with which the new asset 
will be linked to. 


e {given_asset_id} is used in “upload a new asset” action, providing new- 


asset’s ID. 
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Table 2.39. APIs of the back-end related to Assets. 


Action HTTP Method Endpoint 
Get a list with the stored assets GET {HOST}/assets 

Get a specific asset, using its ID GET {HOST}/assets/{asset_id} 
Upload a new asset with random ID, POST {HOST} /assets/{description_id} 
linked to a specific description 

Upload a new asset with given ID, POST {HOST}/assets/{description_id}/{given_asset_id} 
linked to a specific description 

Update a specific asset, using its ID PUT {HOST}/assets/ {asset_id} 
Delete a specific asset, using its ID DELETE {HOST}/assets/{asset_id} 
Delete all assets (administrators’ DELETE {HOST}/assets/all 

action) 


Table 2.40. Interface: Get a list with the stored assets. 


Title: Get a list with the stored assets 

Endpoint: {HOST}/assets 

HTTP Method: GET 

Description: A GET request to this endpoint will result in the retrieval of a list with the 


stored assets and some additional information of them. The endpoint is 
restricted and available only to administrators and thus, the requesters’ must 
provide their API keys in the headers of the request. 


Body Data: None 
Headers: Key Value 
API KEY  Requesters API key. 
URL Parameters: Parameter Value 
None None 
Query Parameters: None 


Restrictions/Special Features: Available only to administrators. 
Successful Response: Results in JSON Object. 


The following is an example of the request in CURL: 


curl --request GET '{HOST}/assets' --header 'API KEY: «value»' 


* Some of these actions require additional fields in the headers or even in the 
body of the HTTP request. Example of a required field is the API key that 
users use in order to validate themselves to the platform. 


Table 2.40 to 2.46 provide details on each of these actions respectively. 


Table 2.41. Interface: Get a specific asset, using its ID. 


Title: Get a specific asset, using its ID 
Endpoint: {HOST}/assets/{asset_id} 
HTTP Method: GET 


(Continued) 
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Table 2.41. Continued 


Description: This endpoint is used to retrieve a specific stored asset. For the retrieval of 
the assets, the usage of the asset's ID is necessary. Also, this endpoint is 
restricted and thus, the API key of a requester must be included in the 
headers of the request. 


Body Data: None 
Headers: Key Value 
API KEY  Requesters API key. 
URL Parameters: Parameter Value 
asset, id The ID of the asset that will be retrieved. 
Query Parameters: None 


Restrictions/Special Features: Available to all authenticated (and verified) users. 
Successful Response: Binary data 
The following is an example of the request in CURL: 


curl --request GET '(HOST)/assets/[(asset id)' =-header ‘API KEY: 
<value>' 


Table 2.42. Interface: Upload a new asset with random ID, linked to a specific description. 


Title: Upload a new asset with random ID, linked to a specific description 
Endpoint: {HOST}/assets/{description_id} 

HTTP Method: POST 

Description: Through this endpoint the users can upload their assets. It requires to add at 


the end of the endpoint the description’s ID with which is going to be 
linked. It is also needed to add to the headers of the request the API key of 
the provider and the asset’s filename, whilst the assets must be uploaded as 
form-data with the key “asset”. 


Body Data: Data Type: Form Data 

Key Value 

asset Binary data/Path to file 
Headers: Key Value 

API KEY Requester's API key. 

x-filename New asset's filename. 
URL Parameters: Parameter Value 


description id The ID of the description with which the new 
asset is going to be linked. 


Query Parameters: None 

Restrictions/Special Features: Available for the descriptions’ providers with which the assets will be linked, 
and also for the administrators who can upload assets on behalf of the 
providers. 

Successful Response: JSON Object with the new asset’s ID in its content. 

The following is an example of the request in CURL: 


curl --request POST 
'(HOST)/assets/(description id)' \ - 
-header 'API KEY: <value>' --header 
'x-filename: <value>' \ 

--form 'asset=@"<full path to assetat: 
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Table 2.43. Interface: Upload a new asset with given ID, linked to a specific description. 


Title: Upload a new asset with given ID, linked to a specific description 
Endpoint: {HOST}/assets/{description_id}/{given_asset_id} 

HTTP Method: POST 

Description: This endpoint is similar to the previous. The difference is that with the 


current endpoint it is possible to specify the ID of the new asset, providing it 
at the end of the endpoint {given_asset_id}. Currently, this endpoint can be 
used only by the administrators of the market platform. 


Body Data: Data Type: Form Data 

Key Value 

asset Binary data/Path to file 
Headers: Key Value 

API KEY Requester's API key. 

x-filename New asset’s filename. 
URL Parameters: Parameter Value 


description id The ID of the description with which the new asset is 
going to be linked. 


given asset id The ID to be given to the new asset. 
Query Parameters: None 


Restrictions/Special Features: Available only for administrators whether they upload an asset for their 
descriptions or upload an asset on behalf of the providers. 


Successful Response: JSON Object with the new asset’s ID in its content. 


The following is an example of the request in cURL: 


curl --request POST '(HOST]/assets/(description id)/(given asset id)' \ 
--header TAPI KEY: <value>' -header 'x-filename: «value»' \ 
--form 'asset-Q"«full path to asset»"' 


Table 2.44. Interface: Update a specific asset, using its ID. 


Title: Update a specific asset, using its ID 

Endpoint: {HOST}/assets/{asset_id} 

HTTP Method: PUT 

Description: With this PUT request, it is possible to update an already stored asset. The asset’s ID that 


should be included at the end of the endpoint, determines which asset should be replaced 
by the newly provided asset. As in the uploading, the asset should be uploaded as 
form-data with the key “asset” and the headers of the request have to contain provider's 
API key. It should be noted that the users can only update the assets provided by 
themselves (except for administrators). 


Body Data: Data Type: Form Data 

Key Value 

asset Binary data/Path to file 
Headers: Key Value 


API_KEY Requester’s API key. 
x-filename [Optional] 


Asset’s new filename. 


(Continued) 
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Table 2.44. Continued 


URL Parameters: Parameter Value 
asset. id The ID of the asset that will be updated. 
Query Parameters: None 


Restrictions/Special Features: Available only for the providers of the descriptions/assets and for the 
administrators who can update stored assets on behalf of the providers. 


Successful Response: JSON Object with a successful message. 


The following is an example of the request in cURL: 

curl --request PUT '{HOST}/assets/{asset_id}' A 

--header 'API KEY: <value>' --header ‘x-filename: «value»' \ 
=-form ‘asset=@"<full path to asset-"' 


Table 2.45. Interface: Delete a specific asset, using its ID. 


Title: Delete a specific asset, using its ID 

Endpoint: {HOST}/assets/{asset_id} 

HTTP Method: DELETE 

Description: A DELETE request to this endpoint has as a result the deletion of a specific 


asset, by using its ID in order to find it. This endpoint is restricted and thus, 
users’ API keys must be included in the headers of the request. It should be 
noted that an asset can be deleted only by its provider and the administrators. 


Body Data: None 
Headers: Key Value 
API KEY  Requesters API key. 
URL Parameters: Parameter Value 
asset, id The ID of the asset that will be deleted. 
Query Parameters: None 


Restrictions/Special Features: Available only for the providers of the descriptions/assets and for the 
administrators who can delete any stored assets. 


Successful Response: JSON Object with a successful message. 
The following is an example of the request in CURL: 


curl --request DELETE '{HOST}/assets/{asset_id}' --header 'API_KEY: 
<value>' 


Table 2.46. Interface: Delete all assets (administrators’ action). 


Title: Delete all assets (administrators’ action) 

Endpoint: {HOST}/assets/all 

HTTP Method: DELETE 

Description: This DELETE request is similar to the above request, with the difference that it deletes all 
the assets, as it uses the keyword “all”. Again, it is necessary the usage of the users’ API 


keys and it is only available to administrators. For security reasons, the requesters 
should provide their password in the body of their request, as raw data (JSON schema): 


iL selsistioiatel“5 Mas” p 


(Continued) 
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Table 2.46. Continued 


Body Data: Raw (JSON) Data — as the above schema (Content-Type: application/json). 
Headers: Key Value 

API KEY  Requester's API key. 
URL Parameters: Parameter Value 


None.id | None 
Query Parameters: None 
Restrictions/Special Features: Available only to the administrators of the platform. 
Successful Response: JSON Object with a successful message. 


The following is an example of the request in cURL: 


curl --request DELETE '{HOST}/assets/all' --header 
'API KEY: <value>' NV  --header 'Content-Type: 
application/json'c-cdata-raw “4 "password": "nog 


2.7 Other Interfaces 


One endpoint that was not mentioned is that of the back-end's root interface, which 
presents a roadmap of the main back-end’s interfaces. The latter is described at the 


Table 2.47: 


Table 2.47. Interface: Root interface. 


Title: Root interface 

Endpoint: {HOST} 

HTTP Method: GET 

Description: This endpoint returns a list with the back-end’s interfaces that are available 


to be used by all users. It acts as a roadmap, providing the interfaces along 
with short information about the functionalities that they trigger. The 
structure of the information follows a tree approach. 


Body Data: None 

Headers: Key Value 
None None 

URL Parameters: Parameter Value 
None None 

Query Parameters: None 

Restrictions/Special Features: | None 


Successful Response: Back-end’s roadmap in text/plain. 


The following is an example of the request in CURL: 


curl --request GET ‘{HOST}’ 


Except for the already described interfaces, the back-end provides the following 
restricted interfaces that will not be described, since they are mostly related only 
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with the platform’s administrators. In short, the interfaces’ titles are: 


e Get a list of the administrators, 

* Add a new administrator, 

* Remove an administrator, 

* Get system's backup, 

* Restore a backup, 

* Get a list of all users, 

* Get a list with system metrics/report. 


2.8 Validation Scenarios 


This section presents some validation scenarios regarding the functionalities of 
the back-end component of the marketplace in combination with the endpoints 
described in the previous sections. 

Specifically, the scenarios that will be presented are how users can be authenti- 
cated to the back-end, how they can upload a new description, how they can upload 
an asset for the description they created in the second scenario and how users can 
retrieve an asset that matches their interests. 

It is noted that all scenarios will be done exclusively using the back-end. Simi- 
lar scenarios, but using the frontend, are presented in deliverable D8.6 “IoT and 
Blockchain Solutions Marketplace — II". 


2.8.1 Authentication Scenario 


We consider a registered user in the INFINITECH Marketplace with username 
“test_user” and password “test_password”. The user, as described in Section 3.2.1, 
in order to be authenticated, needs to send a POST HTTP request to the endpoint 
“HOS T}/accounts/users/authentication”. In the body of the request, the user should 
add its credentials in the following format: 


{ "username": "test user", "password": "test password" } 


Thus, according to the description of the endpoint, the user should send the 
following request (using cURL): 
curl =-regquest FOST 


'(HOST)/accounts/users/authentication' V --header 
"Content-Type: application/json' \ 


= data raw "{ "username": " test user", “pessword™: " 
test password" ji 
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The response of this request, if everything is fine (e.g. correct credentials), will 
be a temporary code that acts as an API key. Through this API key, the user is able 
to use the rest functionalities of the marketplace. 


2.8.2 Upload Description Scenario 


In this scenario, we will use the user of the previous scenario and we consider that 
the user followed the above scenario in order to be authenticated to the marketplace 
(i.e. has an API key). 

The same user wants to upload a new asset/solution to the Marketplace. In this 
case, the first step that need to do, is to upload/create a new description to the 
Marketplace. 

All descriptions have a specific schema and should be filled in accordingly. Using 
the front-end this is done automatically through the corresponding forms, but in 
case the request is made in the back-end, the description must be created by the 
user. 

As described in Section 3.2.2, the user must add this description in JSON for- 
mat as raw data to the body of the request. We suppose that the user wants to 
upload the description for the already uploaded description for the Python Note- 
book “DeepVaR” (https://marketplace.infinitech-h2020.eu/infi assets/deepvar-v 
alueat-risk-prediction-leveraging- deep-learning). The user — provider prepares the 
corresponding description as follows: 


"title": "DeepVaR: Value-at-Risk prediction leveraging Deep 
Learning", 

"description ": "This component both explains and implements the 
DeepVaR, which is a Value-at- Risk model based on deep neural 
networks and Monte Carlo simulations.", 

MEVDeCn in "Notebook"; “owner ": "INNOV-ACTS ETD”, "contact": | m 
Tava iDan Ey NENE C E 

"keywords": [ "Finance", ARISKE 
Management" ], "comments": null } 


In order to upload this description to the marketplace, the user should send a 
POST request to the endpoint “{HOST}/descriptions/{collection}” where the {collec- 
tion] will be the value "notebooks". Also, the user should include the API key that 


got as a response during the authentication (previous scenario). 
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The following is an example of the corresponding request in CURL: 


curl --request POST '{HOST}/descriptions/notebooks' \ 
--header 'API KEY: <value_retrieved_from_scenario_1>' --header 
"Content-Type: application/json' \ --data-raw '{ 


"title": "DeepVaR: Value-at-Risk prediction leveraging Deep 
Learning", 

"description ": "This component both explains and implements the 
DeepVaR, which is a Value-at- Risk model based on deep neural 
networks and Monte Carlo simulations.", 

Views NOI GbOOR MEN OWLDIG TM ATNNOV=ACTS CIG: Mopntact r m m 
"availability": "INEFINITEGCH", 

"keywords": [ "Finance","Risk 
Management" ], "comments": null }' 


The response of this request will be the ID of the new description. This ID will 
be used in the next scenario, in order to upload the descriptions asset. 


2.8.5 Upload Asset Scenario 


This scenario extends the previous scenarios. It uses the API key of the user that has 
been authenticated as per the first scenario, and it uses the ID of the description that 
uploaded in the second scenario. In this scenario, the same user wants to upload 
an asset for the description of DeepVaR Notebook. The user has a compressed file 
(.zip) to upload which need to be linked with a specific description. 

Thus, as described in Section 3.2.4, the user should send a POST request to the 
next endpoint "/HOS Tj/assets/[description id] " where the "description id" is the ID 
that got as a response in the second scenario, during the uploading of the description 
to which the asset will be linked. Moreover, the user should add to the headers of 
the request a) the API key from the first scenario and b) the asset's filename (e.g. 
deepvar.zip). 

The following is an example of the scenario's request in CURL: 


curl --request POST '{HOST}/assets/{description_id}' \ 
--header 'API KEY: «value retrieved from scenario 1»' --header 'x- 
filename: deepvar.zip' \ 
--form 'asset-Q"./deepvar.zip"' 
The response of the request is the new asset's ID and the result, except for the 
storing of the asset, is the connection between the asset and the description. 


2.8.4 Retrieve Asset Scenario 


This scenario combines various interfaces, even the authentication interface that has 
been used in the first scenario, to describe how to retrieve an asset from a description 
that the user found after a related search in the stored descriptions/solutions of the 
Marketplace that cover user's specific needs. 
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We consider an INFINITECH Marketplace user who wants to find a deep 
learning model for identifying Value at Risk occasions (financial metric that esti- 
mates the risk of an investment). So, the first step that user should do is to search 
for descriptions that are relevant to this user’s need. Based on the Sections 3.2.2 
(APIs related to Descriptions) and 3.2.3 (Search functionality on Descriptions), 
the user should send a GET request to the next endpoint “{HOST}/descriptions/all” 
(i.e. search on all descriptions & collections — Table 2.16) using additional query 
parameters with such key-value pairs to retrieve the descriptions that match to the 
"Value at Risk" need. 

More specifically, the user will add as query parameters the following key-value 
pairs and operators: 


Key Operator Value 

title in value, risk 

keywords “eq? or just “=” finance 

description in Value-at-Risk, model 


The following is the request (in cURL) for user's first step (i.e. searching): 


curl --request GET '{HOST}/descriptions/all?info.title.in=value, risk 
&info.keywords-finance 
&info.description.in-Value- 
at-Risk,model"' 


The response of this request is a list of descriptions in JSON format. Moreover, as 
explained in Table 2.16, this interface returns the descriptions in their short schema. 
Thus, as the user views the results, finds an interesting description that may meet 
the desired needs. The description that the user found is the DeepVaR Notebook 
that has been used in the second scenario and its short description schema is the 


following: 
"collection": "notebooks", "id": "notebooks agWabSJaFsgse", 
"info": { "title": " DeepVaR: Value-at-Risk prediction leveraging 
Deep Learning.", 
"keywords": [ "Finance", "Risk Management" ], "owner": "INNOV- 
RCTS LTD, 
"short desc": "This component both explains and implements the 


DeepVaR, which is a Value-at- 


Risk model based.", "type": " notebook" 
|}; metadata": (provident: "al, “updatebate is "m 
VUpMoOadDatel eases Yyiewst: 135 9} 


From the above description, the user retrieves the “ID” of the description (i.e. 
“notebooks_agWabSJaFsgse”) in order to use it in the second step which is to retrieve 
the full schema (and information) of this solution/description. 
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This step can be done by making a GET request to Table’s 2.18 interface “Get 
a specific description (using keyword ‘all”)”, i.e. to “(HOST }/descriptions/all/{descrip- 
tion_id}”, where the (description id] will be replaced by the ID of the above descrip- 
tion. The user can also use the interface of Table 2.19, where the collection of the 
description should be specified, which is known since the collection of the descrip- 
tion is included in the short schema of the description (i.e. “notebooks”). 

It should be noted that the user should use an API key (retrieved after authen- 
tication — scenario 1) in order to retrieve the full schema of the description. The 
following is the request (in CURL) for user's second step (i.e. descriptions retrieval): 


curl --request GET '(HOST)/descriptions/all/notebooks agWabSJaFsgse' \ 
--header "API KEY: €valuex' 


The response of this request is a JSON Object with the descriptions full schema 
in the results. Following the scenario 3, where an asset for the “DeepVaR” has been 
uploaded, the full schema of the description, is the following: 


{ 
"id": "notebooks agWabSJaFsgse", 
"collection": "notebooks", 
SEITE ON 
"title": "DeepVaR: Value-at-Risk prediction leveraging 
Deep Learning.", 
"type": "notebook", 
“comments”: null, 
Teont ac EE aa 
“description”: “This component both explains and 
implements the DeepVaR, which is a Value-at-Risk 
model based on deep neural networks and Monte Carlo 
simulations.”, 
"keywords": [ “Finance”, “Risk 
Management” ], "owner": "INNOV-ACTS 
CID” “availability ES TNEDBNWBROHA “license”: t 
"subtype": null, "fieldsOfUse": [] 
), 
"metadata": { 
popprovedi i aas teupdatedgbyd e aOxmclbis aS 
"provider": ".", updateDate": “..”, "uploadDate": 
D 
Myensrond m “views”: 1357; 
"assets": [ 
{ 
"verified": 1, "downloads": 40, "filename": 
"deepvar.zip", 
"id": "80F7MjRTIxvb-7qIKRAjv-IJ3p-b3vL", "version": 1, 
mco Mi MU clcise Dati: W7 
io (æla delg = riel 
il 


“resources”: [] 


The last step for the user in order to retrieve an asset that seems help- 
ful, covering users needs, is to send a GET request to the following endpoint 
“{HOSTHassets/{asset_id}” where the (asset. id] is the ID of description’s asset which 
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can be found in the “id” field under the “assets” field of the above description’s full 
schema. As described in Section 3.2.4 and in Table 2.41, the endpoint is restricted 
and thus, the user needs also to provide an API key (scenario 1) in the headers of 
the request. 

Thus, the user should make the following request (example in cURL): 


curl --request GET '(HOST)/assets/80F7MjRTIxvb-7qIKRAjv-IJ3p-b3vL' \ 
--header 'API KEY: <value>' 


This request returns the binary data (as a file/depending on the type of asset) of 
the specified-by-its-ID asset which can be used by the user for work, scientific or 
other reasons. 


DOE: 10.1561/9781638282334.ch3 


Chapter 3 


INFINITECH Technology Assets 


3.1 Anti-Money Laundering 


Spyros X 


Fraud detection & money laundering detection system. The fraud detection system 
is ML-based, while the money laundering system is Rule Based. It is Business to 
Business model and it is for NFT, Crypto currency and Bank owners that want to 
secure their networks. 


Graph-based Anomaly Data Platform 


Webtool including a demo to test a variety of use-cases (e.g. self financing by bank 
account/recharge, etc.). It allows both loading in-built Nexi uses cases on synthetic 
data and importing external files. 


Graph-based Anomaly Exploration Platform 


Graph-based Anomaly Exploration Platform as Jupyter Server engine for data sci- 
entists. It is a server based Jupyter notebook which allows expert data professionals 
to develop scripts and perform a more customized analysis. The data specialist can 
interact with algorithm outputs or payments data: it is possible for him/her to write 
code and therefore explore results with customized queries and/or algorithms. 
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3.2 API 


Cross Over Open Banking Hub v1.0 


CrossOver is a PSD2 engine that helps developers and fintechs make use of all 
PSD2 APIs using a single interface. It handles the entire lifecycle of consents, stores 
the users’ accounts, and executes payments so you can focus on the core aspects of 
your application. 


Reportbrain News Sentiment API 


Reportbrain News API — Facilitate News features using our structuring of unstruc- 
tured data in the News by transforming, normalizing, and augmenting it for a spe- 
cific application. Customers can use Reportbrain’s query language to explore that 
data and retrieve news articles and their metadata. The Reportbrain News API is 
accessed through HTTP via a list of available calls and their parameters. 


INFINITECH Graph Data Model Online Tool 


Data Interoperability for Fintech’s and Financial Sector. This online tool refers to 
the INFINITECH Project Online Ontology Mapping Framework and Toolkit, it 
includes the Graph Data Model, the Data Sharing Files and Ontology Files used 
in the INFINITECH project. 


3.3 Applications 


Personal Insurance from Data Analytics (P.I.D.A) 


A system to aid customize personalized insurance plans for vehicle owners. Gath- 
ering vehicle and trip data from IoT sensors and utilizing a distributed database 
paired with neural networks, we are able to accurately profile each vehicle owner 
and provide a suitable insurance plan, including Pay How You Drive (PHYD) and 
Pay As You Drive (PAYD). 


Blockchain-enabled Consent Management 


A decentralized and robust blockchain-enabled consent management mechanism, 
that will enable the sharing of the customers consent to exchange and utilize their 
customer data across different banking institutions. It enables the financial institu- 
tions to effectively manage and share their customers’ consents in a transparent and 
unambiguous manner. It can store the consents and their complete update history 
with complete consents’ versioning in a secure and trusted manner. 
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INFINITECH Open API Gateway 


The INFINITECH Open API Gateway is a sophisticated API Gateway that 
encompasses the Open API specification to provide a single point of entry for the 
added-value functionalities of INFINITECH which are based on microservices. 


INFINITECH Data Collection 


The INFINITECH Data Collection component is designed and implemented 
aiming to address the need for a holistic mechanism that will empower the data 
providers to configure and execute data collection pipelines tailored to their needs. 


Scalable Transaction Graph Analysis Component 


This application constructs the transaction graph from blockchain transactions and 
analyses the graph using graph algorithms. 


Automatic Data Anonymization Tool for Preserving Privacy and 
Utility on Datasets 


Gradiant’s anonymization tool modifies data to preserve privacy. It is especially indi- 
cated when a dataset contains personal data, and it has to be outsourced/shared with 
a third party. It provides different anonymization algorithms that aim at avoiding 
the appearances of data combinations that could lead to a possible re-identification 
of the data subjects, while monitoring different privacy and utility metrics to assess 
the impact of the anonymization process. 


3.4 Blockchain 


Dynamic Dao Structure for Business Scaling 


The dynamic DAO structure aims to solve the following problem: creating an opti- 
mal framework for a unicorn DAO. Our solution followed Gall’s law, starting from 
a simple model where within a DAO, small DAOs would be represented by their 
elected officials to ensure efficiency and quality decision-making. 

Gradually, a decentralized monolithic DAO would be achieved using a trans- 
parent, interpretable Al-aided decision-making model to determine one’s voting 
quality and power, while optimizing decision quality through accumulated histor- 
ical data and iterations over time. 


BC Based Secure Execution Environment and 
Data Marketplace for Federated Learning 


This work is a collaboration between FBK and IBM within the INFINITECH 
project. FBK has developed a fraud detection federated learning algorithm, run by 
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consortium of organizations in their own nodes and on their own data, while shar- 
ing the intermediate learning results with each other to achieve higher precision. 
IBM provides a blockchain-based secure execution framework for the distributed 
fraud algorithm execution, recording all the meta-information regarding the exe- 
cution, and recording the shared intermediate results on a transparent, immutable, 
and verifiable ledger. At the end of the execution a completed model can be shared 
as a tradeable asset on a blockchain-based data marketplace also developed by IBM. 


ERC1155 Token Smart Contract for Hyperledger Fabric 


The Hyperledge Fabric referred as chaincode operates smart contracts in an iso- 
lated container environment, such as Docker, and can be written in standard pro- 
gramming languages, such as Go and Node.js. Smart contracts offer the required 
interfaces that are exploited by applications outside of the blockchain network in 
order to interact with distributed ledger providing the required level of abstrac- 
tion, as well as increased level of privacy and confidentiality. To further promote 
the privacy and confidentiality, Fabric enables the creation of channels in which the 
participants own a separate ledger of transactions from the rest of the blockchain 
network that is visible only to the participants of the channel. Finally, it provides 
the feature of private data, where collections of data can be only be visible and 
accessible to a portion of the participants of a specific channel. [D4.7] 

The implementation and execution of smart contracts varies depending on the 
blockchain implementation [D4.7]. To facilitate all the operations performed in 
the ledger within the blockchain network, the smart contracts (or chaincode) are 
leveraged. Smart contracts are the trusted distributed applications that are deployed 
within the nodes of the blockchain network and encapsulate the business logic of 
the blockchain applications. Smart contracts include the agreements that the par- 
ticipants of the blockchain network have formulated with regards to the generation 
of new facts that are added to the ledger and that will update the current and his- 
torical state of the facts that are already stored in the ledger. In this sense, the smart 
contracts enable the creation of new transactions by the users of the blockchain 
network by invoking the smart contracts’ functions. The smart contracts are facil- 
itating the controlled access to the ledger, offering a layer of abstraction on top of 
the aspired transactions, encapsulating and simplifying all the relevant information 
while also ensuring their compliance with the underlying legal agreements, as well 
as the automation of the several aspects of the transactions. 

One of the most critical concepts of blockchain technology is the consensus 
model that is utilised in order to validate a transaction and to keep the ledger trans- 
actions synchronized across the blockchain network. Hence, the consensus model 
undertakes the validation and approval of the candidate transactions and ensures 
that the copies of the ledger that are kept within the nodes of the blockchain 
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network are updated with the same transactions and in the same order. As the 
blockchain network is composed by multiple nodes, it is very likely that many 
publishing nodes will compete at the same time to publish new nodes. Addition- 
ally, conflicts might be created by nodes publishing new block at approximately the 
same time. Hence, it is evident that a method is required to ensure that transac- 
tions will be written to the ledger at the same order as they generated, as well as that 
malformed or malicious transactions are rejected. For this reason, the blockchains 
depending on their implementation specifications exploit different consensus mod- 
els that are available in computer science such as the CFT (crash fault-tolerant) or 
BFT (byzantine fault-tolerant) ordering, while at the same time large research effort 
is spent on this topic towards the definition of further alternative consensus models 
capable of better addressing this issue with less trade-offs. [D4.7] 


3.5 Data Marketplace 


BC Based Secure Execution Environment and 
Data Marketplace for Federated Learning 


This work is a joint collaboration between FBK and IBM within the INFINITECH 
project. FBK has developed a fraud detection federated learning algorithm, run by 
consortium of organizations in their own nodes and on their own data, while shar- 
ing the intermediate learning results with each other to achieve higher precision. 

IBM provides a blockchain-based secure execution framework for the distributed 
fraud algorithm execution, recording all the meta-information regarding the exe- 
cution, and recording the shared intermediate results on a transparent, immutable, 
and verifiable ledger. At the end of the execution a completed model can be shared 
as a tradeable asset on a blockchain-based data marketplace also developed by IBM. 

The blockchain-based data marketplace provides a tradeable assets catalogue, 
where consumer organizations can search for available assets and gain access by 
paying for those assets using tokens. 


3.6 Database Management Systems 


Ultra Scalable Transactional Processing Without Blocking in the 
Advent of Failures 


The invented transactional processing method enables to scale horizontally in a 
linear way transactional management and does not block transactional management 
when a node fails, thus enabling all transactions to progress except for the ones 
involved with data of the failed node. 
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3.7 Dataset 


Dataset with Risk Estimates of Major Currency Pairs on 
the Forex Market 


This dataset includes Value at Risk (VaR) and Expected Shortfall (ES) estimations 
of the major currency pairs on the Forex market. Notably, it provides daily VaR and 
ES estimates for the AUDUSD, EURCAD, EURCHE EURUSD, GBPUSD, and 
USDJPY FX assets for January 2021 to September 2022. 

Transaction Graph Dataset for the Bitcoin Blockchain 


This dataset contains bitcoin transfer transactions extracted from the Bitcoin Main- 
net blockchain. 


High Frequency (Tick) Data of Historical FOREX Prices 


Price tick data for the most liquid Forex assets (AUDUSD, EURCAD, EURCHE 
EURUSD, GBPUSD, USDJPY). The period covered: 09 March 2020 to 07 
September 2022. 


Synthetic Data of Transactions for Loan Fraud 


The data set includes tagged synthetic data of transactions. Tags indicate transac- 
tions of fraudulent immediate loan requests. 


AEMET Weather Dataset 


Dataset composed by AEMET’s weather data. 


CTAG Alerts Dataset 


Traffic alerts data extracted from CTAG stations during the duration of the pilot, 
describing the category of the alert, location, and timestamp. 


SUMO Vigo Vehicles Sample 


Dataset compiling a comprehensive set of Vigo vehicle data generated by the 
SUMO urban mobility simulator. 


NBG Datasets - Card Transactions Dataset 


Card Transactions Dataset. 
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NBG Datasets - Card Transactions Dataset 


Deposit Account Transactions Dataset. 


Pagerank Dataset for Bitcoin Blockchain 


This dataset contains Pagerank values and rankings for Bitcoin blockchain addresses 
and transaction IDs. 


Pagerank Dataset for Ethereum Blockchain 


This dataset contains Pagerank values and rankings for Ethereum blockchain 
addresses. 


NBG Datasets - Data Dictionary 


Data Dictionary for the datasets. 


NBG Datasets 


Investment Transactions Dataset. 


Transaction Graph Dataset for the Ethereum Blockchain 


This dataset contains ether as well as popular ERC20 token transfer transactions 
extracted from the Ethereum Mainnet blockchain. 


Processed Synthetic RWD for Tristate Modelling 


Processed version of the Raw Synthetic RWD into weekly vectors for training, 
validation, and testing, including 15 input attributes and a tristate outcome 
attribute. 


Processed Synthetic RWD for Binary Modelling 


Processed version of the Raw Synthetic RWD into weekly vectors for training, 
validation, and testing, including 15 input attributes and a binary outcome 
attribute. 


Docker 85 


Raw Synthetic RWD 


Real World Data from a simulator for 1,000 people belonging to any of four 
behavioural groups: athletic, normal, unfit and feeble, simulated over 2 years and 
3 months. 


3.8 Docker 


Data Protection Orchestrator - DPO 


The Data Protection Orchestrator orchestrates privacy, security and data protection 
components and services. 


5.81 Docker Containers 


Pseudoanonymizer 


The main purpose of the service is to mask the personal information about the 
clients and still be informative enough for the purpose of analysis. The service sup- 
ports different types of conversion for different field types. Text, BIC and IBAN 
fields: Encryption using a hash or block cypher function. 

With this method, the original information is hidden while the same values are 
preserved after encryption. Numeric and timestamps: rounding and inclusion of 
noise. Base units for timestamps are days. 

The service works by sending an HTTP POST request to [website- 
address]/pseudonymize with file and data key values. The data is processed as spec- 
ified in the flow and returned in the same format in which it was received. 


Blockchain-enabled Consent Management 


A decentralized and robust blockchain-enabled consent management mechanism, 
that will enable the sharing of the customers’ consent to exchange and utilize their 
customer data across different banking institutions. It enables the financial insti- 
tutions to effectively manage and share their customers’ consents in a transparent 
and unambiguous manner. It is capable of storing the consents and their complete 
update history with complete consents’ versioning in a secure and trusted manner. 


INFINITECH Open API Gateway 


The INFINITECH Open API Gateway is a sophisticated API Server that encom- 
passes the Open API specification to provide a single entry point for the added-value 
functionalities of INFINITECH which are based on microservices. 
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INFINITECH Data Collection 


The INFINITECH Data Collection component is designed and implemented 
aiming to address the need for a holistic mechanism that will empower the data 
providers to configure and execute data collection pipelines tailored to their needs. 


Decode - Metadata Creation for Banking Domain 


Extraction of complex metadata from raw bank documents is paramount to support 
intelligent data indexing, to face the challenge of sharing info in an effective way 
within large organisations. Banking language is very specific and rather different 
from common language. General purpose semantic engines may be not effective in 
understanding banking related concepts. This evidence raises the need to develop 
innovative solutions for metadata extraction. 

The asset that has been developed is based on a weakly-supervised neural 
methodology for creating semantic metadata from bank documents. It exploits 
a neural pre-training method optimized against legacy semantic resources able to 
minimize the training effort. The method has been tested on documents from the 
Italian banking community. 


3.9 Finance 


Al Based Agentive Bank Clerk and Financial Consultant 


Al-based agent bank clerk and financial consultant solution provides natural lan- 
guage processing functionalities, allowing users to complete tasks and access infor- 
mation in a more intuitive and user-friendly way. 

This could potentially improve market awareness and efficiency by providing 
faster and more accurate responses to customer inquiries. 


FinFlink: Streaming Technical Indicator Generation for Apache 
Flink 


Most financial analytics and financial machine learned models do not process 
trading data raw, but instead convert that data into a series of more meaning- 
ful ‘technical indicators’ that capture price movements and trends. FinFlink is a 
Java library which was developed during the INFINITECH project and imple- 
ments real-time technical indicators within the Apache Flink distributed processing 
platform. 
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SME Transaction Categorization 


The classification of SME transactions is vital for the additional development of 
financial management microservices. The absence of labelled data is the main chal- 
lenge when developing a transaction categorization model. 

This model is created by initially hand-labelling a representative subset based on 
expert knowledge creating a rule-based model, which can then be integrated with 
a supervised machine learning model, offering a high degree of update automation 
and transaction re-classification. 


PSD2 Open Banking Chatbot SDK powered by botakis 


Botakis is an AI powered intelligent agent platform to support automated opera- 
tions in a conversational way. It is compatible with any device and OS, messaging 
platforms based on open-source software. The SDK for the PSD2 chatbot is avail- 
able through the INFINITECH project. 

Using the following SDK, there can be an interface with an open banking API 
of a Bank with the final result of creating a chatbot, through which the user can 
give consent to execute AIS or PIS scenarios depending on the case. 


BC Based Secure Execution Environment and 
Data Marketplace for Federated Learning 


This work is a joint collaboration between FBK and IBM within the INFINITECH 
project. FBK has developed a fraud detection federated learning algorithm, run by 
consortium of organizations in their own nodes and on their own data, while shar- 
ing the intermediate learning results with each other to achieve higher precision. 
IBM provides a blockchain-based secure execution framework for the distributed 
fraud algorithm execution, recording all the meta-information regarding the exe- 
cution, and recording the shared intermediate results on a transparent, immutable, 
and verifiable ledger. At the end of the execution a completed model can be shared 
as a tradeable asset on a blockchain-based data marketplace also developed by IBM. 


3.10 Financial Risk Estimation 


DeepVaR: Value-at-Risk Prediction Leveraging Deep Learning 


This notebook both explains and implements the DeepVaR, which is a Value-at- 
Risk model based on deep neural networks and Monte Carlo simulations. The 
DNN is used to estimate the parameters of the portfolio returns’ distribution, 
which are used to produce the MC samples. As far as the DNN is concerned, the 
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DeepAR estimator from GluonTS package is utilized to perform probabilistic fore- 
casts, while the VaR is calculated leveraging DeepAR’s output. 


3.11 Fraud Detection 


ML-based Fraud Detection System 


ML-based Fraud Detection System goal is to detect frauds in the banking industry 
related with the request of loan for vehicle purchase. This tool simulates an ML- 
Based system that has been integrated into an API and connected to a UI. This 
asset is a result of the participation in the INFINITECH Datathon. 


Al Based Detection of Fraudulent Immediate Loan Requests 


The asset demonstrates the capabilities of an AI based model detecting fraudulent 
requests for immediate loans. The AI model facilitates quick analysis of requests, 
and this way reduces the workload of bank’s SoC analysts. 

With the help of the model and a lightweight visualization presenting the scoring 
as well as providing explanations the SoC analyst is enabled to access information 
quickly and decision in combination with other security tools will be simplified 
and accelerated. 


Fraud Detection System 


Robust fraud detection system that aims at improving the detection rate of mali- 
cious events (i.e., fraud attempts) and enabling the identification of security-related 
anomalies while they are occurring by the analysis in real-time of the financial trans- 
actions of a home and mobile banking system. 


Graph Anomaly Detection 


Exploration of Data analytics and advanced machine learning techniques to detect 
anomalies in bitcoin transaction data. Developed this high-quality code utilizing 
python and advanced Neural Network libraries such as Pytorch and Pytorch Geo- 
metric. The Business model is a pay-as-you-go and it is under development. 


3.12 Mobile Application 


DUOS - Digital User Onboarding System 


Digital User Onboarding System — DUOS Mobile application for dealing with 


virtual identities in a mobile device. 
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3.13 Model 


Binary Wellbeing Assessment RF Model 


Random Forest model to predict short term wellbeing variation, learnt from the 
“Processed Synthetic RWD for binary modelling” dataset, yielding 76.7% balanced 
accuracy. 


Tristate Wellbeing Assessment RF Model 


Random Forest model to predict short term wellbeing variation, learnt from the 
“Processed Synthetic RWD for tristate modelling” dataset, yielding 64.2% balanced 
accuracy. 


3.14 Monitoring Tool 


Blue Behavior 


It is a prediction model using satellite data, logbook data, vessel monitoring and 
staff information, etc. combined to rate vessel — and fisherman behavior. When 
the model measures a higher probability of illegal activity, a higher probability of 
inspection by the authorities is triggered. Thus, decreasing the incentive for vessels 
and fisherman to do illegal fishery. 


3.15 Notebook 


Graph-based Anomaly Exploration Platform 


Graph-based Anomaly Exploration Platform as Jupyter Server engine for data sci- 
entists. It is a server-based jupyter notebook which allows expert data professionals 
to develop scripts and perform a more customized analysis. 

The data specialist can interact with algorithm outputs or payments data: it is 
possible for him/her to write code and therefore explore results with customized 
queries and/or algorithms. 


SMEs Cashflow Prediction 


This notebook demonstrates in an explanatory way how to predict in a probabilistic 
way the future outflows and inflows based on historical transactions. It is applied on 
SME data, but it can be fine-tuned and applied in general for cashflow prediction. 
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Portfolio-Value-at-Risk-estimation 
This notebook demonstrates in an explanatory way how to estimate and evaluate 


the Value-at-Risk (VaR) of financial portfolios. 


3.16 Recommender 


Financial Asset Recommender: Profitability Estimation with 
Sentiment 


This is a supervised financial asset recommender service. It takes as input historical 
asset pricing data (daily market closing prices), in addition to sentiment indica- 
tors derived from news headlines that are mapped to those assets and it predicts 
the future price for that asset (one month in advance). This is a publicly available 
containerized service. This component comes in two main flavors, namely: 


1. a python jupyter notebook tutorial, 
2. acontainerized microservice with Rest API that can host a previously trained 
recommendation model. 


Typically, new users should start with the python jupyter notebook tutorial as 
this will guide you through the process of training a recommendation model based 
on past data and evaluating its performance. Users with already trained models can 
then try the microservice container for hosting the models that they have trained. 


Financial Asset Recommender: Profitability Estimation 


This is a supervised financial asset recommender service. It takes as input historical 
asset pricing data (weekly market closing prices) and for an asset and it predicts 
the annualized return on investment (i.e. profit %) for that asset. This is a pub- 
licly available containerized service. This component comes in two main flavors, 
namely: 


1. a python jupyter notebook tutorial, 
2. acontainerized microservice with Rest API that can host a previously trained 
recommendation model. 


Typically, new users should start with the python jupyter notebook tutorial as 
this will guide you through the process of training a recommendation model based 
on past data and evaluating its performance. Users with already trained models 
can then try the microservice container for hosting the models that they have 
trained. 
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3.17 Screening Tool 


Screening Tool Framework 


Anomaly, pattern detection and SCT framework encompass all building blocks for 
BSI testbed and provide basic functionality for the screening tool. The provided 
dockerfile imports transaction data, metadata, fuses the data together and prepares 
a suitable data model and imports it for further use. 


3.18 Service 


Sentiment Analysis in Financial News 


This service performs sentiment analysis in financial news using the FinBERT pre- 
trained model provided by [ProsusAI and Hugging Face] (https://huggingface. 
co/ProsusAlI/finbert). FinBERT is a pre-trained NLP model to analyze sentiment 
of financial text. It is built by further training the BERT language model in the 
finance domain, using a large financial corpus, and thereby fine-tuning it for finan- 
cial sentiment classification. 


3.19 SMEs Business Risks 


SMEs Dataset Business Risk 


Data set composed of variables that help predict different risks of SMEs. 


3.20 Software 


Al Profiler Service 


The service uses a KMeans clustering method to separate the different driving routes 
into clusters according to the different values provided by the connected cars. The 
resulting clusters will divide into each route, allowing the insurance companies to 
better calculate the prices of the insurance policies. The repository contains the 
required files to deploy the service providing the driver profiler Al component. 


Smart Fleets Platform 


Repository hosting the Smart Fleets platform images and necessary deployment 


files. 
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3.21 TAHO 


The Traffic Analysis Hub Ontology 


TAHO is an ontology to represent the Traffic Analysis Hubs data as a semantic 
model. 


3.22 Visualization 


Stream Story 


Stream Story is a tool intended to help with the analysis and interpretation of time 
varying data. The system helps the user search for recurrent patterns by representing 
the data as a diagram of states and transitions, where recurrent patterns visually 
stand out. 

To construct its representation, Stream Story uses and adapts several machine 
learning techniques. The Stream Story’s mechanisms to uncover and explain the 
structure within the data are visual (hierarchical Markov chain, charts, decision 
trees, parallel coordinates) and provide a textual narrative explaining/summarising 
states and patterns within the data. 


Veesualive 


Business intelligence platform that provides advanced data exploration and visual- 
isation functionalities 


3.23 Other 


Tokenization on Hyperledger Fabric -ERC20 Chaincode 


Tokenization on Hyperledger Fabric -ERC20 chaincode. 
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Chapter 4 


INFINITECH Continuous Integration/ 
Continuous Deployment Tools 


4.31 Tools and Techniques for Testbeds and Sandboxes 


This chapter describes the tools and techniques that will be leveraged to imple- 
ment the testbeds and sandboxes concepts within the INFINITECH project, con- 
sidering that the INFINITECH-RA is designed leveraging a paradigm based on 
a microservices [1] architecture implementation, with services interacting among 
them through REST APIs. 

Key pillars for the implementation and deployment of a microservices based 
architecture are the containers technology [2] and its leading open-source con- 
tainers orchestration solution, i.e. Kubernetes [3]. Therefore, in order to under- 
stand such methodological and technological choices for the realization of the 
INFINITECH-RA, some key concepts related to containers, microservices and 
Kubernetes, and how their appearance in the IT environments has deeply changed 
the software development approaches, are presented. 

Finally, the details about how the concrete usage of such technologies benefits 
and enables the definition of the INFINITECH testbeds and sandboxes concepts 
are also provided. 
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4411 Containers Benefits 


In the last few years there has been a strong transformation, similar to what hap- 
pened in the early 2000s with the advent of virtualization, due to containers spread 
which led to rethinking both how to manage the infrastructure and how to design 
and build the applications. 

The advent and the diffusion of containers has made it possible to improve the 
computational management of infrastructures, thanks to the possibility of remov- 
ing the overhead generated using the hypervisor (integrated software that allows 
to virtualize the HW resources of a server and make them available among several 
applications), and through the usage of the functionalities already available within 
the Linux OS kernel, as in Figure 4.1. 

The container engine, to date the most used one is Docker [4], takes advan- 
tage of 


e Linux-Control groups: Allow each container to get its fair share of memory, 
CPU, disk I/O and Network stack. At the same time a single container cannot 
bring the system down by exhausting one of those resources. 

* Linux-Namespaces: Provide the possibility to isolate the processes running 
into a container from processes running in another container or in the host 


system. 


to run containers like VMs isolated each other. 

In the end, a container is a package that contains code, system libraries, depen- 
dencies and software tools. An example is given in Figure 4.2. 

The main advantages to use containers respect VMs can be summarized in these 
macro points: 


Size: a Container is small. 
Overhead: no fully OS is required. 


Speed: Boot time is faster. 


Scaling: Real time provisioning. 
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Figure 4.1. VM vs container. 
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Figure 4.2. Container kernel properties. 
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Figure 4.3. Monolithic vs microservice. 


At the same time, in order to take advantage of containers is necessary to rethink 


and redesign the currently monolithic applications into microservices based appli- 


cations. 


44.2  Microservices Approach 


The spread of the containers led, as already mentioned, to the necessity to change 


the approach of the developers in the creation of an application, moving from 


design of monolithic applications, where the various components (generally UI, 


Business logic and Data-layer) were strongly coupled among them, to microser- 


vices applications where the various components (i.e. microservices) are decoupled 
from each other (see Figure 4.5). 


This methodological change has both advantages and disadvantages. 


The advantages can be summarized as: 


Simple to develop: Each microservice is independent and small. 

Simple to upgrade: Since each microservice is independent it's possible to 
upgrade each component independently. 

Simple interaction: Each microservice communicates with each other 
through well-defined and standard interfaces (API). 

Simple to scale: Each microservice can be scaled independently. 

Scale on demand: It's possible to run multiple microservices behind a load 
balancer to scale on demand request. 
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The disadvantages can be summarized as: 


* Complexity: Splitting an application into multiple independent microser- 
vices increases the complexity of the deployment process. 

* Monitoring: Monitoring each microservice requires to have many metrics 
and logs to manage it. 

* Performance: All communications occur on network so they are slower than 
memory communications. 

* Debugging: a Monolithic application is much easier to debug and test due to 
the fact it is composed by single indivisible units. 


Our approach for a rapid iterative development of a microservice based software 
infrastructure builds upon a widely used methodology like DevOps. 


41.3 Kubernetes Containers Orchestration 


The arrival on the market of containers and related microservices based applica- 
tions, on the one hand enabled applications that quickly scale according to the 
requirements and that could be easily updated, on the other hand meant that soft- 
ware previously managed as a single indivisible piece was split into several dozens 
of microservices (containers), making it more difficult to manage them. 

In this context, the necessity to develop a tool that was able to manage the life- 
cycle of the microservices (deployment, scaling, and management) arose: such tool 
was developed by Google with the name of "Project Seven of Nine" and released as 
open source software in 2014. Today such tool is widely known as Kubernetes. 

Kubernetes provides: 


* Service discovery and load balancing. Kubernetes can expose a container 
using the DNS name or using its own IP address. If traffic to a container 
is high, Kubernetes can load balance and distribute the network traffic so 
that the deployment is stable. 

* Storage orchestration. Kubernetes allows to automatically mount a storage 
system of different types, such as local storages, public cloud providers, 
and more. 

* Automated rollouts and rollbacks. You can describe the desired state for your 
deployed containers using Kubernetes, and it can change the actual state to 
the desired state at a controlled rate. For example, you can automate Kuber- 
netes to create new containers for your deployment, remove existing contain- 
ers and adopt all their resources to the new containers. 

* Automatic bin packing. You provide Kubernetes with a cluster of nodes that it 
can use to run containerized tasks. You tell Kubernetes how much CPU and 
memory (RAM) each container needs. Kubernetes can fit containers onto 
your nodes to make the best use of your resources. 
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Self-healing. Kubernetes restarts containers that fail, replaces containers, kills 
containers that do not respond to your user-defined health check, and does 
not advertise them to clients until they are ready to serve. 

Secret and configuration management. Kubernetes lets you to store and man- 
age sensitive information, such as passwords, OAuth tokens, and SSH keys. 
You can deploy and update secrets and application configuration without 
rebuilding your container images, and without exposing secrets in your stack 
configuration. 


41.4 Kubernetes Architecture 


A Kubernetes (aka K8s) cluster is made up of two macro blocks, the first one called 
Control Plane (Master) and the second one called Data Plane (Worker). 
The Control Plane constitutes the brain of the cluster and internally it is made 


up of the following components: 


Kube-APIserver is the component that exposes cluster API and truly is the 
main component, since Kubernetes has been designed and built to base all 
the operations on the use of the API. 

Etcd is a key value database that maintains all information relating to the 
status of the cluster. 

Kube-scheduler schedules on which nodes of the Data Plane runs the con- 
tainers (in Kubernetes named POD, the smallest deployable units of comput- 
ing that can be created and managed in Kubernetes) based on the resources 
required, cluster status and affinity and anti-affinity rules. 
Kube-controllermanager consists of a set of control processes which: 

o Check if cluster nodes are active. 

o Check if number and status of running POD it’s required one. 
o Control and create token to access on the K8s resource. 

O 


Populates the Endpoints object (that is, joins Services & Pods). 


The Data Plane is the part where the workload is carried out, i.e. where the PODs 


are put into execution and it is characterized by the following components: 


Kube-proxy is a proxy that allows communication to the PODs from within 
and outside the cluster. 

Kubelet checks that PODs are running. 

Container runtime (engine) is a software responsible for running containers 
in Kubernetes can be used(Docker, CRI-O and containerd). 


In the Figure 4.4 the Kubernetes Cluster components is depicted: 


For an outline of the entire Kubernetes solution, see the documentation on 


kubernetes.io. 
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Figure 4.4. Kubernetes architecture. 


However, for the purpose of this chapter it is sufficient to now outline two main 


concepts available in Kubernetes that we will use later to implement the Sandbox 
concept. 


1. 


Namespaces: They are a logical grouping of a set of Kubernetes objects to 
whom it’s possible to apply some policies, in particular: 


o Quote sets the limits on how many HW resources can be consumed by 
all objects. 

o Network defines if the namespace can be accessed or can access to other 
Namespaces, in other word if the Namespace is isolated or accessible. Dif- 
ferent policies can be given to different namespaces. 


POD (Figure 4.5) is the simplest unit in the Kubernetes object. A Pod encap- 
sulates one container, but in some cases (when the application is complex) 


Container 1 


ant 


Container N 
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Figure 4.5. Kubernetes POD. 
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a POD can encapsulate more than one container. Each POD has its own 
storage resources, a unique network IP, access port and options related to 
how the container/s should run. 


4.2 INFINITECH Testbeds 


INFINITECH will make available a number of testbeds for experimentation, test- 
ing and validation of BigData, AI and IoT solutions, including: 


* 10 testbeds that are established in the data centres of incumbent financial 
organizations(on premise testbeds). 

* 1 testbed that will be provisioned and established in order to support the 
experimentation of the FinTech and InsurTech pilots and enterprises of the 
consortium, hosted on the partner NOVA’s Data Center. 

* 1 testbed that will be provisioned and established in order to support 
the experimentation of the INFINITECH blueprint reference testbed (see 
Section 4.2 of the deliverable, associated to one of the official INFINITECH 
pilots), hosted on the AWS (Amazon Web Services) [6] public provider. 


Accordingly, the current, (at the time of writing, subject to possible evolutions 
along the project lifecycle), INFINITECH project plan is to deliver 15 pilots: 
10 out of 15 will be carried out on dedicated on premise Data Centres, while the 
remaining 5 out of 15 will be carried out on the NOVAS Data Centre, a shared 
INFINITECH Data Centre. In addition, a blueprint reference testbed will be also 
provided, built upon the requirements of one of the INFINITECH pilots, as stated 
above. 

The set of hardware resources like storage, compute and network will be consid- 
ered a testbed, as shown in Figure 4.6. 

It is not relevant where these resources are deployed: they can be inside a private 
Data Centre or in any cloud provider. 

Therefore, the 15 pilots that have been foreseen will be executed in 1042 


testbeds, in addition to the blueprint reference testbed, as shown in Figure 4.7. 
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Figure 4.6. Testbed. 
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Figure 4.7. Testbeds and pilots. 


4.3 INFINITECH Sandboxes 


Each INFINITECH pilot will have one or more Use Cases (realized by one or more 
pilot Apps, each one realized by one or more INFINITECH microservices): in our 
vision each Use Case will be a Sandbox provisioned by the leverage of Kubernetes 
Namespaces. 

In fact, as we already said in the previous paragraph, the Kubernetes Names- 
pace feature makes it possible to logically isolate the objects (mainly PODs) inside 
it from other Namespaces. Therefore, each dedicated Testbed will only have one 
Kubernetes cluster with as many Namespaces as the number of Use Cases to be 
implemented for a single pilot (see Figure 4.8). In the other case, each (2 out 10) 
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Figure 4.8. Sandboxes in a dedicated Testbed. 
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Figure 4.9. Sandboxes in a shared Testbed. 


shared Testbed will have one Kubernetes cluster for each pilot it has to host and 
manage (see Figure 4.9). 

In the end, in this general context, each pilot App will be realized as a Kuber- 
netes POD. 


DOE: 10.1561/9781638282334.ch5 


Chapter 5 


INFINITECH Tools and Techniques for 
Management of Data Sets 


5.1 Tools and Techniques for Management of Datasets 


This chapter describes the tools and techniques that will be leveraged for the man- 
agement of datasets within the INFINITECH project and how they are linked and 
mapped with the concepts and techniques related to testbeds and sandboxes and 
also with respect to the blueprint reference testbed environment. 


511 Data Sources and Data Access 


The INFINITECH platform aims to host and serve the needs of a variety of appli- 
cations and tools used by the finance and insurance sectors, which have diverse 
needs and requirements for data access. We envisage that different types of data 
sources will be supported such as: 


* structural, semi-structural or completely un-structured data, 

* static data, often called data at-rest or streaming data, while the source might 
be stored on premises, in a third-party organization or should be imported 
inside the deployed sandbox. For instance, a common solution might need 
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to use historical data in order to perform post processing and apply ML/DL 
analytics for risk assessment. 


However, modern enterprises are not only interested in extracting knowledge 
and identifying risks or opportunities based on historical and often obsolete data, 
but they also need to extract this type of information from real data, which intro- 
duces a significant challenge regarding data management from the tailored sandbox 
perspective. To further complicate the needs of the platform, other requirements 
coming from the finance and insurance sectors are the identification of potential 
fraudulent financial transactions as they occur, or pre-processing of data from IoT 
devices as it arrives. Data analysts often tend to use different datasets that can be 
retrieved from third party organizations or other external sources such as social 
media to enrich their algorithms with deeper knowledge in order to build a more 
accurate profile of their customers, either referring to individuals or to enterprises. 
Finally, additional considerations must be taken into account when dealing with the 
volume and privacy of the data of a finance organization, where moving data from 
the source to a sandbox might not be feasible either due to regulatory constraints, 
or simply due to the overall volume of the datasets. 

In order to address all these requirements, the INFINITECH-RA has been 
designed in order to take into account the different types of data sources. With 
respect to the implementation to support the functionality needed for data man- 
agement and processing, details have been given in the corresponding deliverables 
of WP3, WP4 and WP5. 

In looking at the techniques and tools that will allow the tailored sandboxes to 
enable the data management of these diverse environments, the focus is on data 
access. In other words, how the different components that will be deployed inside 
the sandbox can be integrated in a way that allows them to make use of all the 
aforementioned data sources. 

We can identify 6 different focuses for data access: 


€ Static data ingestion. 

* Dynamic data ingestion. 

* [oT streaming. 

* Direct access to on premise data sources. 
e Blockchain data access. 

* Third party data access. 


In the following subsections, we will describe the details of each of those different 
means of data access and will analyze the requirements and technical considerations 
that need to be tackled by the infrastructure, when orchestrating the deployment 
and maintenance ofthe integrated solution. The next section will give more insights 
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on how the INFINITECH platform overcomes the barriers and deals with the 
challenge of securely allowing those diverse means for data access, thus enabling 
the automated deployment of the integrated solution. 


51.2 Static Data Ingestion 


This is the most typical case for data access that is foreseen to be a requirement 
for the majority of the cases that will need to be deployed into an INFINITECH 
sandbox. The data analyst has their own data stored into a database management 
system or other form of persistent storage and needs to extract a specific dataset and 
ingest it into the central data repository deployed inside the sandbox so that they 
can use the tools provided by the platform to perform the analysis. They will have 
to extract this information in a static file in a specific format (i.e. csv) and trigger 
the data ingestion process of INFINITECH. We call this data access mode static 
data ingestion, as the file will be created once, and will be used in each deployment. 
This access mode also covers the needs for a data analyst to make use of an exist- 
ing synthetic dataset that has been provided and made available to the platform, 
via the INFINITECH marketplace. The generated file must be stored into a per- 
sistent storage volume and the latter must be visible from the sandbox so that the 
tools deployed inside can have access to it. This mainly refers to the data ingestion 
component that will take care of the data migration process, possibly applying a 
pre-processing algorithm to clean, harmonize and anonymize the dataset and finally 
store the raw data into the data repository. It will also support cases where direct 
access to the static file might be needed by the analytical tools on the analytics layer 
of the RA, where there is no need for additional processing by the lower logical 
levels of the architecture (i.e. a spark job needs to grab everything from a csv file, 
where no additional processing like a filter or join operation can be pushed down 
to the processing layers). 


51.3 Dynamic Data Ingestion 


In this scenario, data ingestion does not take place once in the initialization of 
the sandbox, rather the data might be migrated periodically to the data reposi- 
tory. We call this dynamic data ingestion as the dataset inside the sandbox will be 
updated with new data after a given period of time. This will cover cases where the 
requirement is to have the sandbox deployed, integrated and synchronized with 
the data sources of the organization. For instance, the integrated solution might 
need to get updated with the current snapshot of the data, so that the data ana- 
lyst can benefit from a daily picture of the transactions of any given customer. 
Rather than having a human intervene to manually extract the data needed into 


Tools and Techniques for Management of Datasets 105 


a static file that will be loaded into the system, the data provider can implement 
specific APIs so that the data migration can be done in a fully automated man- 
ner every time (i.e. end of the business day). The data ingestion component of 
INFINITECH will need to open a connection to those APIs in order to retrieve 
the data according to the given specified protocol or configuration. The APIs can 
vary, from REST implementations that imply HTTP connections, to database spe- 
cific ones (JDBC, ODBC, etc.) that will imply TPC connections, or even SFTP 
connections to a file server. This introduces the requirement for the sandbox to 
allow the INFINITECH component responsible for this job to access endpoints 
outside the sandbox using different communication protocols. Regarding the need 
for data access, only the data ingestion component will need to have access to those 
endpoints, in order to migrate the data into the central data repository. All other 
components in the different logical layers of the RA will retrieve the data via the data 
repository. 


5.1.4 loT Streaming 


As we already mentioned, it becomes more crucial for modern applications in the 
finance and insurance sector to perform real-time analysis in order to respond to 
possible risks, identify opportunities or even detect fraud transactions, as they occur. 
This type differentiates from the others as it does not rely on data at-rest, but on 
streaming. As a result, content event processing over a data stream is becoming 
popular. Data transmitted over a stream is usually small and contains information 
from either a sensor deployed in a vehicle or in the soil, information coming from 
a finance transaction, logging information produced when a user is navigating the 
web or even a tweet or post on social media. We consider all these types of data 
as IoT, and we categorize this type for data access as IoT Streaming. The unified 
query processing framework of INFINITECH will require access to data streams, as 
it provides the means to deploy continuous queries that can perform live processing 
over the content of the streams, possibly making use of the other layers that allow 
the combination of operators targeting live data with static data at-rest. The unified 
query processing framework contains a streaming engine that consumes the stream 
from the source. The requirement for the infrastructure in this case is to allow the 
components that are deployed inside the sandbox to be accessible from outside 
of the sandbox. To concretize the scenario, the sources of the data streams must 
connect to the query processing framework of the platform by establishing static 
TPC connections to the latter. All other components in the different logical layers 
of the RA will retrieve data and information from this layer, so there is no need for 
others to be accessible from external data sources. 
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51.5 Direct Access to on Premise Data Sources 


This covers scenarios where it is not enough for the integrated solution to have 
access only to a snapshot of the dataset that has been periodically loaded into the 
sandbox, but it requires access to the overall dataset. It will also cover scenarios 
where data cannot be migrated into the sandbox due to their volume or due to other 
regulatory constraints. In the case of ‘big data’ management, it might be better to 
push down the data processing to the source and grab only the results that will feed 
the tools and components in the analytical layer, rather than moving all the data 
inside the sandbox to be loaded to the central repository beforehand. We call this 
mode for data access as direct access to on premise sources as the sandbox needs to 
access directly the data source and send the processing there, and it will not migrate 
any data inside. The polyglot component of INFINITECH is responsible for this 
functionality, which lies in the processing layer of the RA, even if it pushes down to 
processing to the source. However, when the processing takes place, from a logical 
point of view, the polyglot takes care of this and all other layers in the analytical 
layers will make use of the latter. The requirements for the sandbox are similar 
with the direct data ingestion mode, but it is the polyglot component now that 
needs to open connections to external endpoints from the sandbox, using different 
communication protocols, varying from HTTP, to TPC and SFTP. 


51.6 Blockchain Data Access 


As the name of this access mode implies, it covers scenarios where secure access 
to data stored in the Blockchain is required. We are interested here in uses of 
Blockchain as a means of persistent storage of data, and not as a means of verifying 
the consensus for a given transaction. As it has been described in the corresponding 
deliverables of the tasks related with the Blockchain technology in WP4, the access 
is being granted via a specified API. Therefore, from a functional view, the compo- 
nents that need to access Blockchain data will have to open a connection to the pro- 
vided APIs and retrieve data. The Blockchain data needs no pre-processing to the 
source neither can it benefit from the processing capabilities of the INFINITECH 
data management components. Therefore, it is the components that are lying in 
the analytical layer of the RA that need to be granted access to the Blockchain stor- 
age via the appropriate designated endpoint. However, from the perspective of the 
infrastructure that will orchestrate the automated deployment and maintenance of 
the integrated solution, the requirements are similar to the dynamic data ingestion: 
to allow components inside the sandbox to establish connections with endpoints 
that are located externally from the sandbox. 
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54.7 Third Party Data Access 


In this access mode, the components and analytical tools provided by the 
INFINITECH platform need to retrieve data and information that has been pro- 
vided by external sources. This scenario is different from the previous ones, and 
especially to the direct access to on premise sources, as the third party does not grant 
access to its entire dataset, rather than provides specific APIs that will allow others 
to extract only specific information. In other words, it can allow to submit a consult 
(i.e. how popular is the given trend in the articles of UK during the last week) and 
retrieve its result. The third-party organization has access to other datasets, and has 
already performed an initial pre-processing that generates the results of the types of 
consults its API allows to perform. Therefore, there is no need for the data manage- 
ment component that are suited in the processing layers of the RA to perform any 
further analysis and this information can be directly used by the analytical tools. 
However, from the perspective of the infrastructure that manages the sandboxes, 
the requirement is the same as the Blockchain: to allow the components deployed 
in the sandbox to open connections to APIs that are deployed outside. 


5.1.8 Datasets Management from the Blueprint Reference 
Environment Perspective 


Even if we have several and diverse types for data access, the requirements for the 
datasets and data access management tools and the techniques from the blueprint 
reference environment perspective, or rather from the testbeds and sandboxes per- 
spective, can be described and addressed by grouping the methodologies presented 
in the previous paragraphs into the following macro categories: 


1. Static data ingestion: The blueprint must be able to store static data into a 
persistent volume that needs to be accessible by the sandboxes. This require- 
ment is fulfilled using the Kubernetes PV (Persistent Volume) available inside 
the blueprint and in particular by using the PVC (Persistent Volume Claim) 
that allows to split the PV according to the sandboxes needs. 

2. Dynamic data ingestion — Blockchain data access — Third party data access: 
The blueprint must be able to allow the components of different layers 
of the logical architecture view to open connections to external endpoints. 
Accordingly, the components deployed inside the sandboxes need to com- 
municate with the external endpoints. This communication is permitted by 
default, unless otherwise stated, through the API Gateway which forwards 
HTTP/HTTPS requests to and from the sandboxes. 

3. IoT Streaming — Direct access to on premise data sources: The blueprint must 
be able to allow for components that have been already deployed to connect 
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and maintain open connections with components that have been deployed 
inside a sandbox. Accordingly, the components that have been deployed 
inside a sandbox have to maintain an open connection on TPC protocol. 
In this case the connection are not managed by the API Gateway, but using 
the Kubernetes NodePort. The NodePort functionality exposes the sandbox 
service on each worker’s IP at a static port in the range between 30000 and 
32767. Each worker proxies such port (the same port number on every Node) 
into the sandbox service. In order to clarify this concept, let’s suppose that 
the sandbox service works on the port 390: this port is then remapped on the 
port 30390 on each worker node, and in this way the sandbox can then be 
reached by external components and maintains an open connection on the 


port 30390. 
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INFINITECH Reference Testbed 
and Sandbox 


6.1 Overview of the INFINITECH Blueprint Reference 
Testbed 


This chapter describes the initial design and implementation of the INFINITECH 
blueprint reference testbed, through the actual realization (with a full compliance) 
of the INFINITECH-RA Development and Deployment views, in terms of the 
concrete specification and realization of the fundamental and target INFINITECH 
concepts of Testbeds, Sandboxes and Datasets management, and related tools and 
techniques for their effective setup and deployment in the INFINITECH pilots 
and validation scenarios. In other words, how the concept described earlier will be 
realized on the target INFINITECH infrastructure environments. 

Moreover, the planned blueprint environment associated to the initial and 
preliminary Proof of Concept (PoC) implementation of one of the official 
INFINITECH pilots (at the time of writing, the WP7 Pilot 5b: Business Financial 
Management (BFM) tools delivering Smart Business Advice, owned by the partner 
Bank of Cyprus (BOC)) is also described. 
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6.1.1 Development View 


With respect to the Development view, it has been decided to implement DevOps 
(Development and Operations) processes. DevOps represents a change in IT cul- 
ture, focusing on rapid IT service delivery through the adoption of agile, lean prac- 
tices in the context of a system-oriented approach. DevOps emphasizes people (and 
culture), and seeks to improve collaboration between operations and development 
teams. DevOps implementations utilize technology-especially automation tools 
that can leverage an increasingly programmable and dynamic infrastructure from a 
life cycle perspective [5]. 

The practical implementation of DevOps goes through the CI/CD processes, 
which are delivered through the combined practices of Continuous Integration (CI) 
and Continuous Delivery (CD). In particular: 


* Continuous Integration is a practice where development teams frequently 
commit (many times per day) application code changes to a shared repository. 
These changes automatically trigger new builds that are then validated by 
automated testing (as in Development Testing, DevTest [8]), to ensure that 
they do not break any functionality. 

* Continuous Delivery is an extension of the CI process. Its the automation 
of the release process so that new code is deployed to target environments, 
typically to test environments, in a repeatable and automated fashion. 


In order to fulfil the INFINITECH project goals, the CI/CD processes have 
been created in the context of the blueprint reference testbed environment. More- 
over, to build a system working consistently as a whole, the developers writing 
the individual components of the INFINITECH platform need an integrated 
environment where they can test their components working together with the 
other services. To support this process, we implemented a Continuous Integration 
environment based on EKS (Elastic Kubernetes Service) [7], a managed Kuber- 
netes service on the AWS public cloud. More details on EKS will be provided in 
the following section Kubernetes is an ideal choice for a Continuous Integration 
environment, since it allows easy updates of deployments when new application 
images are built, with manifests containing deployment configurations versioned 
like Git [8] alongside the application source code. Furthermore, it is easy to spin 
up new test environments from scratch, which enables future scenarios including 
automated end-to-end integration testing. Build agents are also created on demand 
and removed when done, providing efficient resource utilization and clean envi- 
ronments to ensure build reproducibility. 


Overview of the INFINITECH Blueprint Reference Testbed 111 


On the target cluster, a namespace named devops have been created for hosting 
the DevOps tools, which are: 


e Gitlab [9] is a Git repository manager that lets each developer teams collab- 
orate on the INFINITECH'’s source code. 

e Jenkins [10] is the de-facto standard open source automation server for 
orchestrating CI/CD workflows. 

e Sonatype Nexus [11] is a popular artefact repository that also works as a 
Docker registry, as required in our case. 

* OpenLDAP [12] is used as the single user directory for all tools, centralizing 
authentication and simplifying management of developer accounts. 

* Helm [13] is a package manager that streamlines installing and managing 
Kubernetes applications. 


Figure 6.1 shows how CI/CD works for a specific partner (e.g. Partner ^A"). 
When a developer pushes new component code, Gitlab invokes a webhook on 
Jenkins, which starts any job affected by the code changes. The job builds the com- 
ponent, runs unit tests and, if everything has worked in a proper way, builds an 
updated Docker image, and pushes it to Nexus. The following step is deploying 
the updated component in the specific partner namespace; in fact, we will have as 
many namespaces as the partners in order to maintain the correct isolation between 
all INFINITECH partners. In order to deploy the component Helm manager will 
be used. At the end of the process, Jenkins sends a notification to a dedicated CI/CD 
channel on the INFINITECH Slack [14] project, so that developers are informed 
that a new build occurred and whether it was successful or not. In case of errors, 
developers will have to inspect the build logs, find the problem, and correct it. In 
case of success, developers will go ahead and test that the new version works cor- 
rectly in the test environment. 
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Figure 6.1. CI/CD workflow. 
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We intend to enhance the process by adopting a DevSecOps [15] approach and 
including the related tools like Sonarqube [16] in the CI/CD pipeline. 

DevSecOps aims to include security in the software development life cycle from 
the beginning, following the same principles of DevOps. Security is then consid- 
ered throughout the process and not just as an afterthought at the end of it, so that 
different kinds of security checks are executed continuously and automatically, giv- 
ing developers quick feedback if the latest changes introduced a vulnerability that 
must be corrected. 

DevSecOps of course requires that security experts work side by side with devel- 
opers and operations to make sure that security requirements are addressed and best 
practices followed, in addition to validating product design and architecture. 

Moreover, in this context, in order to facilitate the Machine Learning (ML) 
development and testing, we have also planned to introduce (again, in the fol- 
lowing iterations of WP6 T6.2 and T6.3 tasks and activities), the MLOps [17] (a 
compound of Machine Learning and IT Operations) methodology focused on: 


e Facilitate communication and collaboration between teams. 

* Improve model tracking, versioning, monitoring and management. 

e Standardize the machine learning process to prepare for increasing regulation 
and policy. 


This add-on enhanced methodology will enable us to automate the porting of 
the machine learning algorithms, as much as possible, in production environments. 

Putting this into practice is often very complicated, because ML processes are 
often based on heterogeneous environments. 

Therefore, the first step towards MLOps requires the standardization of these 
environments as much as possible, and in this respect the Kubernetes technology 
and containers provide the abstraction, scalability, portability, and reproducibility 
required to run the same piece of software in all these environments. As a second 
step, it is necessary to make standard the workflows used for the construction and 
building of the ML models. 

In this sense, we have selected the Kubeflow [18] platform as candidate for inte- 
gration in our blueprint reference testbed, in order to provide an infrastructure to 
build models capable of enabling the portability of these models and workflows. In 
particular, ML workflows are defined as Kubeflow pipelines and a pipeline consists 
of these steps: 


* Data preparation. 
* Training. 

* Testing. 

e Serving. 
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Each step is a container, and the output of each step is the input of the following 
step. Once compiled, this pipeline is portable across different environments. 


6.1.2 Creation of the EKS INFINITECH Cluster 


The creation of EKS INFINITECH cluster involves four principal steps: 


* Create a specific IAM (Identity and Access Management) Role able to create 
a provisioning EKS cluster. 

e Create the VPC (Virtual Private Cloud). 

* Create the EKS Control plane. 

* Create the Worker node. 


To perform all the steps we will leverage the AWS console and where possible 
also the CloudFormation templates that allow us to replicate the installation and 
configuration anytime in very straightforward way. For the creation of the IAM role 
we will use the following Amazon CloudFormation template: 


AWSTemplateFormatVersion: '2010-09-09' 
Description: 'Amazon EKS Cluster Role' 


Resources: 


ekaClusterRole: 
Type: AWS: : IAM; :Role 
Properties: 
AssumeRolePolicyDocument : 
Version: '2012-10-17' 
Statement: 
- Effect: Allow 
Principal: 
Service: 
- eks.amazonaws.com 
Action: 
- sts:AssumeRole 
ManagedPolicyArns: 
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy 


Outputs: 


RoleArn: 


Description: The role that Amazon EKS will use to create AWS resources for 
Kubernetes clusters 


Value; !GetAtt eksClusterRole,Arn 
Export: 
Name: !Sub "S[AWS::StackName]-RoleArn" 
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To create the VPC for EKS we will use the following template: 


AWSTemplateFormatVersion: '2010-09-09' 
Description: ‘Amazon EKS Sample VPC - Private and Public subnets’ 


Parameters: 


VpcBlock: 
Type: String 
Default: 192.168.0.0/16 


Description: The CIDR range for the VPC. This should be a valid private (RFC 
1318) CIDR range. 


PublicSubnet0iBlock: 
Type: String 
Default: 192.168.0.0/18 
Description: CidrBlock for public subnet 01 within the VPC 


PublicSubnet02Block: 
Type: String 
Default: 192.168.64,0/18 
Description: CidrBlock for public subnet 02 within the VPC 


PrivateSubnetÜlBlock: 
Type: String 
Default: 192.168.128.0/18 
Description: CidrBlock for private subnet 01 within the VPC 


PrivateSubnet02Block: 
Type: String 
Default: 192.168.192.0/18 
Description: CidrBlock for private subnet 02 within the VPC 


Metadata: 
AWS::CloudFormation::Interface: 
ParameterGroups: 
Label: 
default: "Worker Network Configuration" 
Parameters: 

- NpcBlock 
PublicsubnetülBlock 
PublicsubnetÜüiBlock 
PrivateSubnetÜülBlock 
PrivateSubnetüzBlock 


Resources: 
VPC: 

Type: AWS::EC2::VPC 

Properties: 
CidrBlock:  !Ref VpcBlock 
EnableDnsSupport: true 
EnableDnsHostnames: true 
Tage: 
- Key: Name 

Value: 'Sub 'S(AWS::StackName]-VPC' 


InternetGateway: 
Type: "AWS::EC2::InternetGateway" 


VECGatewayAttachment : 
Type: "AWS::EC2::VPCGatewayAttachment" 
Properties: 


InternetGatewayId: !Ref InternetGateway 
Vpcld: !Bef VPC 
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PublicRouteTable: 
Type: AWS::EC2::RouteTable 
Properties: 
VpcId: !Ref VPC 
Tags: 
- Key: Name 
Value: Public Subnets 
- Key: Network 
Value: Public 


PrivateRouteTable01: 
Type: AWS::EC2::RouteTable 
Properties: 
VpcId: !Ref VPC 
Tags: 
- Rey: Name 
Value: Private Subnet AZ1 
- Key: Network 
Value: Private01 


PrivateRouteTable02: 
Type: AWNS::EC2::BouteTable 
Properties: 


Vpcid: !Ref VPC 
Tags: 
- Key: Name 
Value: Private Subnet AZ2 
- Key: Network 
Value: Private02 


PublicRoute: 
DependsOn: VPCGatewayAttachment 
Type: AWS::EC2::Route 
Properties: 
RouteTableId: !Ref PublicRouteTable 
DestinationCidrBlock: 0.0.0.0/0 
GatewayId: !Ref InternetGateway 


PrivateRoute(l: 

DependsOn: 

- VPCGatewayAttachment 

= NatGatewayÜ1 

Type: AWS::EC2::Route 

Properties: 
RouteTableId: !Ref PrivateRouteTableU01l 
DestinationCidrBlock: 0.0.0.0/0 
NatGatewayId: !Ref NatGatewayÜ01l 
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PrivateRouted2: 


Dependson: 

- VPCGatewayAttachment 

= NatGateway02 

Type: AWS::EC2::Route 

Properties: 
RouteTableId: !Ref PrivateRouteTable02 
DestinationCidrBlock: 0.0.0.0/0 
NatGatewayId: !Ref NatGateway02 


NatGateway01: 
DependsOn : 
- HatGatewayEIPl 
= Publicsubnet01 
- VPCGatewayAttachment 
Type: AWS::EC2::NatGateway 
Properties: 
AllocationId: !GetAtt 'NatGatewayEIPl.AllocationId' 
SubnetlId: !Ref PublicSubnet01 
Tags: 
- Key: Name 
Value: !Sub '$(AWS::StackName}-NatGatewayAZ1"' 


NatGatewayÜ2: 
DependsOn: 
= NatGatewayEIP2 
= PublicSubnet02 
- VPCGatewayAt tachment 


Type: AWS::EC2::NatGateway 
Properties: 
AllocationId: !GetAtt 'NatGatewayEIP2.AllocationId"' 
Subnetid: !Ref Publicsubnet02 
Tags: 
- Key: Name 
Value: !Sub '5$(AWS::5tackName]-NatGatewayAZ2' 


NatGatewayEIPl: 
DependsOn : 
- VPCGatewayAttachment 
Type: 'AWS::EC2::EIP' 
Properties: 
Domain: vpc 


NatGatewayEIP2: 
DependsOn: 
- VPCGatewayAttachment 
Type: 'AWS::EC2::EIP' 
Properties: 
Domain: vpc 
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PublicSubnet(ül: 
Type: AWS::EC2::Subnet 
Metadata: 
Comment: Subnet Ül 
Properties: 
MapPublicipOnLaunch: true 
AvailabilityZone: 
En::Select: 
= "g" 
- Ent:GetAZs: 
Ref: AWS::Region 


CidrBlock: 
Ref: PublicSubnetd1Block 
Vpeld: 
Ref: VPC 
Tags: 
- Key: Name 


Valve: !Sub "5|AWS::StackName} -FublicSubnetül" 
- Key: kubernetes.ic/role/elb 
Value: 1 


PublicSubnet(2: 
Type: AWS::EC2::Subnet 
Metadata: 
Comment: Subnet 02 
Properties: 
MapPublicIpOnLaunch: true 
AvailabilityZone: 
Fn: iSelect: 
= iji 
= Fn: :GetAzs: 


Ref: AWS: :Region 


CidrBlock: 

Ref: PublicSubnetÜü2Block 
Vpeld: 

Ref: VPC 
Tags: 


- Key: Name 

Value: !Sub "$([AWS::StackName)-PublicSubnet02" 
= Key: kubernetes.ic/role/elb 

Value: 1 


Privatesubnetd1: 
Type: AWS: :EC2::Subnet 


Metadata: 
Comment: Subnet 03 
Properties: 
AwailabilityZone: 
En::Select: 
=m t D * 
- En::GerAZs: 
Ref: AWS::Region 


CidrBlock: 
Ref: PrivateSubnetÜüiBlack 
Vpcid: 
Ref: VPC 
Tags: 
- Key: Name 


Value; !Sub "S(AWS::StackName]-PrivateSubnet0i'" 
- Key: kubernetes.io/role/internal-elb 
Value: 1 
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PrivateSubnet02: 
Type: AWS::EC2::Subnet 
Metadata: 
Comment: Private Subnet 02 
Properties: 
AvailabilityZone: 
En::Select: 
- *1* 
- En::GetAZs: 
Ref: AWS::Region 
CidrBlock: 
Ref: PrivateSubnet(ü2Block 
VpcId: 
Ref: VPC 
Tags: 
- Key: Name 
Value: !Sub "${AWS::StackName}-PrivateSubnet02" 
- Key: kubernetes.io/role/internal-elb 
Value: 1 


PublicSubnetO0lRouteTableAssociation: 
Type: AWS::EC2::SubnetRouteTableAssociation 


mEPEEFPCE——————————————'ISIBÉBEÉ ERN 


Subnetld: !Ref PublicSubnet01 
RouteTableId;: !Ref PublicRouteTable 


Public&ubnetÜ2RouteTableAssociation: 
Type: AWS::EC2::SubnetRouteTableAssociation 
Properties: 
SubnetId: !Ref PublicSubnet02 
RouteTableId: !Ref PublicRouteTable 


PrivateSubnetÜlRcuteTableAssociation: 
Type: AWS::EC2::SubnetRouteTableAssociation 
Properties: 
Subnetld: !Ref PrivateSubnet01 
RouteTableId: !Ref PrivateRouteTable(l 


PrivateSubnet02RouteTableAssociation: 
Type: AWS: :EC2::SubnetRouteTableAssociation 
Properties: 
Subnetid: !Ref PrivateSubnet02 
RouteTableId: !Ref PrivateRouteTable02 


ControlPlaneSecurityGroup: 
Type: AWS::EC2::SecurityGroup 
Properties: 
GroupDescription: Cluster communication with worker nodes 
VpciId: !Ref VPC 
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Outputs: 
Subnet Ids: 
Description: Subnets IDs in the VPC 
Value: !Join [ ",", [ !Ref PublicSubnet(ül, !Ref PublicSubnet02, !Ref 
PrivateSubnetÜl, !Ref PrivateSubnet02 


F communication with 
worker nodes 


falue: 


Vpeld: 
Description: The VPC Id 
Value: !Ref VPC 


The previous template creates the VPC with two public and two private subnets. 
One public and one private subnets are deployed to the same Availability Zone. The 
second public and private subnets are deployed to a second Availability Zone in the 
same Region. We have chosen this solution because it is that recommended by AWS 
for critical environment. 

After that the VPC is available, it's possible to proceed with the EKS Control 
plane creation that can be done directly from the AWS console at https://cons 
ole.aws.amazon.com/eks/home£/clusters through the input of the following 
parameters: 


* Name: INFINITECH-BP 

e Kubernetes version: 1.17 

* Public and private: true 

* role: eksClusterRole(create with in the previous step) 
* Encryption: false(Activate if is necessary) 

e Subnet: all 

* Security groupsInfo: ControlPlane 

* Access point Public: True 


When the EKS Control plane is available (the creation takes about 15 minutes), 
it is possible to create the worker nodes. 

In order to add the nodes, it is mandatory to create a specific IAM role named 
WorkerRole with the following permissions: 


* AmazonEKSWorkerNodePolicy. 
* AmazonEKS CNI Policy. 
* AmazonEC2ContainerRegistryReadOnly. 


Moreover, to facilitate the role management we will add the tag value 
INFINITECHBPNode. 
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At this time, it is possible to create from the same AWS console the worker nodes 
by clicking on the Compute tab and by entering these values: 


* Name: T3xlarge_16GB4-4vCPU 

* Node IAM role name: WorkerRole 

* AMI type: Amazon Linux 2 (AL2 x86 64) 
* Instance Type: t3a.xlarge(4vCPU/16GB) 
* Disk size: 50 GB 

e Minimum size: 2 

* Maximum size: 3 

© Desired size: 2 

e Subnets: private-sub01; private-sub02 

* Allow remote access to nodes: true 

* Allow remote access from: all 


At this stage, the Kubernetes cluster is ready to use. 


6.1.5 Namespace, Network and Quote Policies 


As previously described, we have decided to implement the INFINITECH Sand- 
box concept leveraging the Kubernetes namespace feature. Therefore, the first step 
is to create a dedicated Namespace for each of the INFINITECH target pilots that 
can be leveraged by each of the owning partners to develop and test their own pilot 
applications. 

To create the namespace we have defined a Kubernetes YAML template like the 
following: 
kind: Namespace 
apiVersion: vl 
"oer Birpel 


labels: 


name: Pilotl 


Some of these tools (like Jenkins and Gitlab) need to be exposed on the Internet 
for convenient access during the development and testing phases. All the endpoints 
will be HTTPS based, with free certificates generated by Let's Encrypt [19], so we 
do not need to set up a Certification Authority or buy certificates from commercial 
CAs. 

However, we do need public DNS entries on the project domain, mapped to a 
public IP that exposes our services outside of the Kubernetes cluster. 

In order to get the public IP, we will use the AWS ELB (Elastic Load Balancer). 
This is provisioned automatically when creating a Kubernetes Service with type 
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LoadBalancer. In our case, this happens while configuring the NGINX Ingress Con- 
troller. The NGINX Ingress Controller that will be deployed will have a dedicated 
namespace named ingress-nginx. 

To implement the network policies to isolate each namespace from each other, 
and eventually also to manage the connection between different PODs within the 
same namespace (i.e. to guarantee the security requirements), we will implement 
the Cilium as CNI (Container Network Interface) on EKS. 

In order to do that, we will remove the AWS CNI and install the Cilium follow- 
ing these steps: 


kubectl -n kube-system delete daemonset aws-node 
helm repo add cilium https://helm.cilium.io/ 


helm install cilium cilium/cilium --version 1.7.9 ^ 
--namespace kube-system \ 

--aet global.enietrue \ 

--set global.egressMasqueradeInterfaces-ethO 

--set global.tunnel-disabled V 

--set global.nodeinit.enabled-true 


After the previous step, it is possible to isolate the namespace implement the 
following rule (the rules is written in the YAML format): 


apiVersion: "cilium.io/v2" 
kind: CiliumNetworkPolicy 
metadata: 
name: "isolate-pilotl" 
namespace: pilotl 
spec: 
endpointSelector: 
matchLabels: 
{} 
ingress: 
- fromEndpoints: 
- matchLabels: 
[] 


Moreover, we will apply the resource quote on memory and CPU that each 
namespace can consume with this YAML template: 


apiVersion: vl 
kind: ResourceQuota 
metadata: 

name: mem-cpu-namespace 
spec: 
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hard: 
requests.cpu: "800m" 
requests.memory: 2Gi 
limits.cpu: "1" 
limits.memory: 3Gi 


After the execution of the previous steps, the Deployment view has been com- 
pleted, as shown in Figure 6.2. 
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Figure 6.2. INFINITECH blueprint reference testbed. 


6.1.4 How to Recreate the Blueprint Testbed for a Specific 
INFINITECH Pilot 


One of the most powerful capabilities and features that are enabled by the techno- 
logical choices we have made and that are described in the previous sections, is that 
the blueprint reference testbed that we have created on AWS can be recreated from 
scratch, with respect to the Deployment view perspective, by each of the partners 
for their own pilots in two possible ways (see Figure 6.3): 


I. On the same AWS cloud provider (a concrete realization is described earlier). 
II. Ina bare metal environment, leveraging their on-premise private data centre 
infrastructure or the shared NOVAS Data Centre. 
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Figure 6.5. Blueprint environment recreation ways. 


The two ways are a little bit different each other, because: 


I. In the first case, the recreation of the cluster can be done in fully automated 
way using the eksctl tool provided by AWS. 

II. In the second case. it is possible to recreate the cluster in a partially auto- 
mated way, because in this case as mandatory prerequisite it is necessary to 
manually create the entire infrastructure that will host the Kubernetes clus- 
ter, and only afterwards it is possible to create the cluster using automated 
tools like Kubespray provided by Kubernetes itself. 


Nevertheless, one of the major objectives of having provided a blueprint ref- 
erence testbed is exactly the powerful concept explained above: a potential easy 
and straightforward replication of it for all the INFINITECH target pilots’ 
environments. 


6.2 Blueprint Environment for Pilot 5b: Bank of Cyprus 


As mentioned earlier, as a concrete INFINITECH blueprint environment associ- 
ated to one of the official INFINITECH pilots, the consortium has selected (at 
the time of writing) the WP7 Pilot 5b: Business Financial Management (BFM) 
tools delivering a Smart Business Advise, owned by the partner Bank of Cyprus 
(BOC), and in particular its initial and preliminary Proof of Concept (PoC) 
implementation. 

This section describes the preliminary concrete realization of such PoC and also 
setup the basis for the future applications of the concepts described in the previous 
section for all the INFINITECH target pilots’ environments. 
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6.21 Pilot Objectives 


Most of today’s Financial Management tools for Small Medium Enterprises (SMEs) 
are geared towards analyzing only past transactions, making such tools inadequate 
in today’s world. Today, SMEs and their customers alike, demand just-in-time pro- 
cessing, transparency and personalized services to assist SME owners not only in 
understanding better their SME business/financial health but also to be able to 
decide on the next best action to take. 

Thus, Pilot 5b aims to assist SME clients of Bank of Cyprus in managing their 
financial health in the areas of cash flow management, continuous spending/cost 
analysis, budgeting, revenue review and VAT provisioning, all by providing a set of 
Al-powered Business Financial Management tools and harnessing available data to 
generate personalized business insights and recommendations. Machine learning 
algorithms, predictive analytics and Al-based interfaces will be utilized to develop 
a kind of smart virtual advisor with the aim to minimize SME business analysis 
effort, to focus on growth opportunities and to optimize cash flows performance. 


6.2.22 Pilot Workflow 


The pilot workflow can be analyzed starting from the datasets that it has to manage. 
Some of the available datasets require real time data collection, while in others 
historical data collection is sufficient to provide actionable business insights. In 
detail, transaction and account data related to the respective SME will be drawn 
from BOC’s repository by a real time/historical data collector as well as transaction 
and account data from Open Banking (PSD2), as well as BOC’s customer data, will 
utilize a historical data collector. In addition, an external data collector will also be 
used in order to integrate other related Open Banking/macroeconomic data. The 
SMEs data source (e.g. ERP/Accounting system) utilization remains optional as 
consent is required for the collection and processing of such data and its cloud 
availability being required. Accordingly, it is possible summarize the involved data 
sources involved in the following list: 


e ‘Transaction Data from Open Banking (PSD2). 
* Transaction Data from SMEs (optional). 

* Other Data (Market). 

e Other Data from SMEs (optional). 

* Accounts Data from BOC. 

* Accounts Data from Open Banking. 

* Customer Data from BOC. 

* Direct Input from SMEs. 
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In the target pilot PoC, all data except external macroeconomic data will be 
pseudonymized (by tokenization) before being uploaded to the IRA. 

The cloud Data Repository (within IRA) will then store all collected data (along 
with the generated insights), past SME financial actions (to measure at what degree 
the SME actions reflect the recommended insights), as well as minimum user input 
that is required. A continuous data streaming will connect the Data Repository 
with the various deployed BFM tools (machine learning algorithms), which would 
allow the retraining of the respective AI models and the generation of useful insights 
and recommended actions. A reverse data pseudonymization will then be applied 
before the processed data move to the bank middleware component that contains 
composite APIs and produces push notifications, all of which will be offered to the 
SMEs via Android, iOS and web applications. Upon SME user login, the IRA is 
also accessed, insights/recommendations picked up from the cloud data repository 
and provided to the SME user. 

The pilot’s workflow is depicted in the below Figure 6.4. 
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Figure 6.4. Pilot 5B workflow. 


6.2.5 Blueprint Reference Testbed Implementation 


As stated before, the blueprint reference testbed (Blueprint v1) refers to the 
INFINITECH BOC pilot and its initial and preliminary PoC version (Proto- 
type vl) aims to accommodate the first version of the pilot. Accordingly, the PoC 
will not use all the available components of the INFINITECH platform, but just 
a selected subset. 
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Group Prototype_v1 Blueprint_v1 
On Premise Data source Layer | (to be decided) 
and Data Management 


Analytics Layer Cash Flow Prediction , 


Transaction Categorization 
Data Models and Semantics Preprocessing 
Data Security and Privacy 
Interfaces Proxy, 

API 


Figure 6.5. Components of prototype v1 and Blueprint vl versions. 


In general, the blueprint testbed basic assumptions are: 


* The pilot does not utilize Blockchain technology, so all the Blockchain- 
oriented components will not be integrated. 

* Data migrated to the blueprint testbed is owned by BOC and will 
be pseudonymised (using the tokenization technique) before entering 
the INFINITECH ecosystem. There is no need for an INFINITECH 
anonymizer for the Prototype v1 PoC. 

* The first PoC version of the pilot (Prototype v1) may not fully exploit the 
first version of the pilot Blueprint (Blueprint v1). There might be addi- 
tional INFINITECH components (especially regarding the Data Manage- 
ment layer) that remain to be decided. 


Prototype v1 and Blueprint v1 versions will include the components listed in 
Figure 6.5. 

The next version of the PoC prototype (Prototype v2) will include stream pro- 
cessing, possibly anonymization and some complementary components of the Ana- 
lytics group. 

The ML/DL models of the Analytics Components (Cash Flow Prediction, 
Transaction Categorization) will be trained offline and then afterwards loaded to 
the blueprint analytics component. The models’ training process (including the 
validation and evaluation processes) is depicted in Figure 6.6. 

Given the aforementioned points, a REST API will be developed on top, which 
will deliver real-time information when invoked. This data will be forwarded to the 
specific analytics component (Cash Flow Prediction, Transaction Categorization) 
via a REST API/TCP connection or other message broker. 

In order to run the pilot inside the blueprint environment a dedicated sandbox 
named pilot5b has been created. The components deployed in such sandbox are: 
Analytics, Pre-processing and Anonymizer. 
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Figure 6.6. Models training process. 


Figure 6.7. Pilot 5b blueprint reference architecture. 


The client connections from the external world towards such a sandbox and all 
sandboxes deployed in the blueprint are managed through an API Gateway (GW), 
which is a key component of the IRA Interface layer. The API GW in the blueprint 
environment is based on the Istio [20] software and is deployed in a dedicated 
sandbox named istio-system. 

The process workflow implies that when a user sends a (HTTPS) request it is 
forwarded by the API GW to the Analytics POD, which will then request the appro- 
priate historical data, stored on premise, from the “Data Management" layer via a 
data connection/REST API or other message broker. The retrieved data will be first 
anonymized and then pre-processed based on the approach used for the training 
procedure. The data will then be forwarded back to the Analytics component in 
order to be injected in the models to infer the outcome of each model. Finally, the 
results will be either returned to the user or saved to the internal Data Base, avail- 
able within the Analytics POD, according to the scope ofthe process. The proposed 
blueprint architecture is illustrated in Figure 6.7. 
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Chapter 7 


Lessons Learnt from INFINITECH Pilots 


71 Lessons Learnt [INFINITECH-Whitepaper-Lessons 
Learnt] 


The context for this section on Lessons Learnt within INFINITECH is the inno- 
vations that we have explored in the project (e.g., in architectures, processes, busi- 
ness models), and how they promote the global creation and sharing of knowledge 
(influenced by prior innovations such as GitHub and GitLab, open science and 
open Networks of Excellence). Similar work is increasingly commonly used in agile 
pilots by governments, large corporations and trans-national researchers to create 
value for society by combining cross-disciplinary expertise as follows: 


(a) Insights into ways to create some form of capacity to look forward 
(e.g., Horizon Scanning), to anticipate and model potentially important 
changes). 

(b) Lessons learned regarding how (and how not to) create some form of analysis 
of joint experience, the better to look back and to look around. 


In Horizon Europe, the combination of (a) and (b) support the spread of 
expertise in ‘Impact’ (including disseminating know-what and know-how) and in 
transferable Pathways (e.g., see cross-cluster, cross-category and cross-program, as 
in Pathways to Impact. 
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In INFINITECH, this focus on generalizable and multi-context competence is 
planned to lead to (c) and (d): 


(c) a collection of Lessons Learnt’ by members of an ecosystem, (e.g., analogous 


to Open Science, or as in adopting Open Standards for Fin Tech), leading to: 


(d) shareable and generalisable competences in learning to learn, e.g., learning 


7.2 


how to refine collections of ‘Lessons Learnt’; and how to transfer, generalise, 
update or correct know-how, to boost community knowledge. 


Cluster/Category 1: Smart, Reliable and Accurate 
Risk and Scoring Assessment 


7.21 


Pilot #2 - Real-time Risk Assessment in Investment 
Banking 


(Near) Real-time risk assessment: one of the lessons learnt from each pilot 
(and planned to be quantified in closing months of the project, in terms of the 
added values of the tool developed within each pilot) is that Pilot 42's findings 
enable stakeholders -users, such as traders and risk managers, a near real time 
risk assessment with several additional futures, including sentiment analysis. 
The required time to obtain risk estimations is in accordance with the target 
KPIs. Notably, the VaR/ES of a portfolio consisting of 8 instruments can be 
obtained in less than 1 second enabling fast what-if analysis. Furthermore, 
the DeepVaR algorithm — developed in the context of the project — turns 
out to be a reliable alternative to classical VaR approaches delivering accurate 
VaR estimations even in periods of high volatility in financial markets. Such 
enhancements in performance have been validated through the deployment 
of the P2 in the dedicated testbed and its back-testing in a large amount of 
historical data. 

Sentiment analysis: P2 provides sentiment analysis in financial news using 
transfer learning based on the FinBERT pre-trained model. The user has 
access to the original text as well as its sentiment. Thus, they can assess the 
output of this AI algorithm. The utilisation of transfer learning also facilitates 
environmental concerns regarding the energy consumption required for the 
training of Al-based models since the used finbert model is already trained 
and validated and is used only for inference. 

Towards a Higher Level of Innovation: Since many financial institu- 
tions still utilise end-of-day (EoD) data rather than intraday data for risk 
assessment, P2 developed an innovative solution that can be used with real- 
time data. 
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Highly Interoperable/Scalable: P2 can be easily incorporated into a vari- 
ety of data sources and/or systems with little effort. Furthermore, the entire 
system is very scalable since i) the risk measurements are first calculated in 
a univariate and parallel manner for each input time series, and ii) its devel- 
opment is built on Kubernetes, which allows distributed computations with 
resource auto-scaling. 

Easy-to-understand: P2 is easy to use and provides a simple visualisation. 
This signifies that the user does not need any specific expertise. Based on 
what-if analysis, the tool delivers more information. It can be utilised to get 
real-time pre-trade risk analysis and can assess them on an individual basis as 
well as in the context of the entire portfolio.P2 includes the ability to com- 
pare the risk assessments with several VaR models (e.g., variance-covariance 
methods, historical VaR, etc). 

Stakeholders’ Feedback: Some comments from the stakeholders highlighted 
the need for more technical details considering ES to be more important than 
VaR. In this context, P2 assesses the risk of the given portfolios not only in 
terms of VaR but also in ES terms. It is noted that the novel DeepVaR model 
provided by the pilot can be used to estimate both risk metrics. 


72.2 Pilot #15 - Open Inter-banking Pilot 


7.3 


Semantic specialisation: General purpose semantic engines are not effective 
in understanding banking related concepts. Pilot 15 tries to specialise a state- 
of-the-art model such as BERT on the banking semantic domain. 
Efficiency of the training process: Minimise the training effort as key to 
ensure a fast development for innovative Al-based solutions. Pilot 15 worked 
on a weakly-supervised method optimised against legacy semantic resources. 
Collaborative approach: Open innovation paradigm is a core value to face 
the challenges of a complex and fast-growing scenario. Pilot 15 enhanced 
collaboration within competitors through a shared governance. 

Research driven approach: Working in an experimental and competitive 
environment may help in enhancing flexibility and promote a research driven 
mindset. Pilot 15 enforced the process of continuous research and evaluation 
of experimental artefacts. 


Cluster/Category 2: Personalized Retail and 
Investment Banking Services 


INFINITECH Cluster#2 includes four (4) Pilots, whose purpose is to offer 
custom-made financial services for both retail and investment banking. These are 
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Figure 7.1. Cluster #2 pilots overview. 


several personalized amenities built on customer-centric analytics and personalized 
digital assistants, having as an ultimate objective to deliver improved and/or new 
technology and business services for banks, SMEs, financial institutions, and gov- 
ernmental agencies, aiming to increase customer experience through personalized 
proposals and more effective servicing. Despite the fact that they address different 
financial services, they use mainly AI and ML technologies, such as data process- 
ing transactions, customer profiling and recommendations optimization. Cluster 
#2 Pilots’ similarities are driven both by their organizations and customers needs 
for personalized offering services as shown in the Figure 7.1. 


7.31 Pilot #3 - Collaborative Customer-centric Data Analytics 
for Financial Services 


Pilot #3 aims at leveraging the process of verification, named Know Your Customer 
(KYC), within regulated banking and financial activities, by which secure and trust- 
worthy data sharing and analysis of financial information (customer, account, and 
transaction data) should be provided. It particularly focuses on identifying unlawful 
activity, such as money laundering and bank fraud, in relation to financial activities 
concerning human trafficking, in the context of a global initiative named “Stop the 
Traffik” and by the use of Traffik Analysus Hub and Traffik Analysis Hub Ontology 
(TAHO) and INFINITECH Technologies. 


e P#3 addresses KYC issues for banks by using AI to search for Human Traffick- 
ing data typologies within the bank’s own customer data. KYC process will 
become more effective if data typologies that give results are shared between 
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banks. This throws up data privacy issues. EU law on Data Privacy currently 
can be interpreted as giving criminals data privacy with certain derogations 
and “let outs” for public interest. Whether a bank’s KYC needs can be said to 
be in the public interest is unclear. On one hand using a negative KYC result 
to undermine the economics of human trafficking is certainly in the public 
interest, on the other hand a bank is not a prime engine of law enforcement 
and as such may not be able to make a clear judgement that their customer is 
carrying on criminal activities and so is not entitled to data privacy. For banks 
to be able to share the red flag typologies that have produced positive KYC 
results, data privacy policy will have to take the P#3 product into account. 
One member of P#3 is BPFI who can influence Irish and EU DP policy as it 
applies to banks in Ireland. 

The use of data typologies as the means to sharing across multiple banks 
allows the red-flag indicators to be shared without actual customer data being 
exchanged. This allows the participating banks to remain compliant with 
their GDPR obligations. It should also be noted that the banks will use 
the outputs from the P#3 service to help identify areas of concern to which 
the bank fraud teams can apply their focus. There will always be a human 
involved in the final decision to act on the red-flags. The P#3 service will not 
be used to automatically decline any customer application or activity. 


* The P#3 develoiped the data typologies identification as a tool to make it 


more more powerful with more participants using it. P#3 is actively dissem- 
inating the tool via webinars with the Joint Intelligence Group (JIG) and 
the International-JIG in April. P#3 also disseminates the use of the tool via 
a monthly analysts’ call of all users of the Traffik Analysis Hub (TAH) (up 
to 65 organisations). Bol is a known leading contributor of Human Traf- 
ficking alerts to law enforcement and has had positive media exposure for 
doing so. Bol expects the P#3 tool to enhance its already industry-leading 
stance on human trafficking. These initiatives are expected to add signifi- 
cantly to the number of P#3 users when it is finally launched on the market. 
P#3 have now engaged a second large bank in the Irish ecosystem to test the 
service. Between these two banks we can reach over 75% of Ireland’s banked 
population. 

Because of Data privacy concerns regarding data sharing, the P#3 tool will 
be used within the bank’s own data set rather than bank data being shared 
to the hub where the P#3 tool resides. Even so it is envisaged that P#3 users 
should share typological patterns that are produced within their datasets — 
this will be an ongoing tension between the banks sharing patterns and their 
regulatory obligations. 
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e Sharing data typologies rather than data will also allow participating banks 
to edit and evolve existing data typologies. If in using a data typology, one 
of the banks finds a way to better refine the combination of flags, that bank 
can then resubmit the enhanced data typology back into the common library. 
This will allow other banks to more easily ingest the enhancement rather than 
have to build entire new rules. 


7.5.2 Pilot #4 - Personalized Portfolio Management ("Why 
Private Banking cannot be for Everyone?”) 


Pilot #4 developed a Portfolio Construction and Optimization algorithm (Prive 
Optimizer or “AIGO”), as well as improved and expanded its capabilities as an 
artificial intelligence engine to support better and personalised investment propo- 
sitions for retail clients. This will enable the approach to make “Private Banking 
like services available for everyone”. By developing a customer-demand and pref- 
erences driven service, Pilot #4 provides a leading technology that offers a fully 
scalable digitised and personalised advisory and wealth management journey for 
financial institutions (e.g., banks, EAMs, investment firms, insurance companies, 
and brokerage firms) and market participants. 


* Professional wealth management: High human interaction and significant 
management costs. Pilot 4 illustrates that this process can be digitized to a 
large extent and that this approach can be extensively modified to individual 
product variations and investors’ needs. 

* Technology vendors and financial institutions tend to use terminology that is 
not familiar to the masses. Pilot 4 tries to use illustrations at various points of 
user journey to graphically support decisions for a non-financially educated 
investor. 

* Challenge: standardized data, long and correct data histories for each asset 
and to find the right risk & return balance. A fairly equal distribution of the 
underlying investment universe is thus pursued. This will make it possible to 
cover a large range of potential and individual investment preferences. 

* Change of Investors’ Requirements and Targets: Megatrends, economic and 
political effects may require regional reassessments or rebalancing of invest- 
ments. This rebalancing, a repetitive task of checking whether a used portfolio 
needs to be readjusted, is a permanent effort, which might be fully digitized. 

e Transparency as a Trust Building Element: We show volatility data, positive 
portfolio effects in mixing higher and lower risk investments in the given 
portfolio, as well as provide portfolio comparison options etc. We thus try to 
maximize the possibilities for the investor to analyse and understand individ- 
ualized proposals. 
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7.5.5 Pilot #5b - Business Financial Management (BFM) Tools 
Delivering Smart Business Advice 


Pilot#5b revolves around SMEs and their needs for business financial manage- 
ment tools. SMEs today are reconsidering their banking relationship and as a result 
leaving their primary banking provider vulnerable to the increasing competition. 
Establishing a versatile platform that can unlock new services and can expose & 
seamlessly integrate evolving technologies represents a key success factor. It offers a 
Business Financial Management solution (BFM) on top of their core banking activ- 
ities, aiming at adding real value to the bank’s SME customers, as well as attracting 
new ones through this new offering. 


* Data tokenization proved to be a valuable anonymization methodology 
which reduced compliance/regulatory challenges, in particular relating to 
GDPR items. It however increased the data preprocessing needs and caused 
a delay to the introduction of near real-time data streaming functionalities 
related to transactional data. 

* Outsourcing Banking data to cloud providers can turn out to be a quite 
lengthy and complex undertaking, with no specific guidelines except from 
a EBA document (EBA Recommendations on outsourcing to cloud ser- 
vice providers, 2017) in place. Having a structured approach and a detailed 
Testbed Development guideline document at hand is of crucial importance 
to safeguard the successful cloud implementation. 

e Conducting and processing the results of 1:1 SME workshops proved to be 
a cumbersome and time intensive task, resulting in contradicting preferences 
and propositions among SMEs, mostly due to the different nature of their 
core business. Grouping SME per industry could materialize into construc- 
tive feedback. 

* To attract SME customers, the BFM platform services added value should 
be clearly communicated and showcased with examples on how the various 
offerings can support the business owner with everyday financial cash flow 
tasks so that he can focus on his/her core business. 


7.5.4 Pilot #6 - Personalized Closed-Loop Investment 
Portfolio Management for Retail Customers 


Pilot#6 developed a Personalized Investment Recommendations Platform for 
Retain Customers, leveraging the customers’ risk profile and other features. The 
platform will be available to NBG financial advisors, who not only examine each 
customers transactional activity, but also take into account similarities and patterns 
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among customers, targeting at increasing customer satisfaction, maintaining and 
reinforcing the customer relationship. 


e Extended Clientele: Pilot 6 develops algorithms that aim in Customer Profil- 
ing and categorization according to their intention to invest. The above are 
based not only in questionnaire input but also in transactional activity. The 
innovation in that part is that the analysis focuses to all Retail Customers and 
not only to highly affluent. 

* UI for Financial Advisor: Pilot 6 wants to create a user friendly UI, so as to 
illustrate consolidated all the necessary information to Financial Advisors. 

* User Friendly UI: Pilot 6, through the UI, wants to provide a “full” image of 
any personalized proposition to customers. 

* Holistic View: A) Enriching the Recommendation Engine and the person- 
alized proposition to the customer, using on top, information from social 
networks which offers an extra ranking to the proposals. 

* Holistic View: Propositions are provided at a significantly lower level com- 
pared to other recommendation engines. 


7.4 Cluster/Category 3: Predictive Financial Crime and 
Fraud Detection 


INFINITECH Cluster£3 includes five pilot systems that involve Predictive Finan- 
cial Crime and Fraud Detection. The related pilots intend to provide advanced 
financial products and services for banks, supervisory authorities, financial insti- 
tutions and governmental agencies, aiming to prevent and protect against finan- 
cial crimes and fraudulent activities. The pilots are built on top of a mix 
of advanced technologies, based on AI and ML, as well as Big Data and 
Blockchain. 


7.4.1 Pilot #7 - Avoiding Financial Crime 


Pilot #7 will provide a module for calculating fraud prevention and detection 
models that help the banks to enhance their current cybersecurity policies and 
controls to avoid financial crime. By means of unsupervised machine learn- 
ing and complex modelling, and supported by advanced computational power 
(near-quantum) technologies, the solution will provide near-real time, operational 
risk level considering the end-customers normal behavior, which could greatly 
improve detection of financial frauds and reduce losses to the banks and society at 
large. 
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* AI Assessment: 


o Limited Risk: No assessment of creditworthiness or credit scoring in this 
system 
o CXB Analysts shall be notified that they interact with an AI system 


* Data requirements: 


o Data quality and size of synthetized data needs to be revised regularly 
o Data Governance shall be taken seriously in advance 


* AI Model: 


o XGBoost is a favourable AI framework compared to scikit.learn. 
o Integration in the INFINITECH ML/DL library to be discussed 


* Deployment: 


o Constraints (data, company policies, etc.) require thorough planning 

o Design needs some scrutinizing and adaptation for onsite requirements at 
CXB 

o Synthetized data shall be required due to GDPR issues. 


74.2 Pilot 48 - Platform for Anti Money Laundering 
Supervision (PAMLS) 


The pilot will develop a platform named PAMLS (Platform for AML Supervision) 
with several tools to enhance and improve the risk-based supervision. Screening 
tool will process and analyse data from different sources and with methods based on 
artificial intelligence (AI) and machine learning (ML) techniques, try to recognise 
unusual patterns and relationships among data, that could indicate typologies and 
risks of money laundering/terrorist financing (ML/FT) at the level of individual 
financial institutions (FI). Detected patterns will feed Risk assessment tool, which 
will assess the FI risk from ML/FT perspective, and this will enable supervisory 
authority (in case of Pilot #8 BOS) to focus its resources on more high-risk FIs. 
PAMLS will also have additional functionalities, a Distribution channel that will 
enable the secure data exchange, and a Search engine: allowing supervisors to look 
for a specific transaction or a sample of transactions. 


* Pilot development takes place in close cooperation between the data provider 
and the end user (BOS) and technical partner (JSI). To ensure an agile 
approach, regular meetings and workshops with different stakeholders 
(project team members, supervisors, IT, legal and compliance, technical 
partner, other experts) were organized. The aforementioned approach has 
proven to be effective in detecting and addressing identified issues and chal- 
lenges. 
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* One of the main challenges was data quality. Several iterations were required 
to prepare data of required quality that were further enriched and pseudo- 
anonymised. During the data preparation a list of data preparation rules was 
created. 

* Due to complex legal and compliance requirements, especially in regard to 
data protection and ethics, experts from Legal and Compliance departments 
are included in the project team, to ensure that the development of the pilot 
is carried out in accordance with legal and ethical requirements. 


7.4.5 Pilot #9 - Analyzing Blockchain Transaction Graphs for 
Fraudulent Activities 


Blockchain crypto currencies and tokenized assets that are obtained fraudulently 
can go through various transfers on the blockchain. Pilot #9 aims to detect such 
fraudulent activities on massive blockchain transaction graphs. Since blockchain 
data is constantly accumulating and will be growing at increasing rates. in the 
future, a parallel scalable transaction graph analysis system is being developed that 
runs on HPC clusters and that can process the growing transaction graph without 
encountering performance bottlenecks. 
Specifically, Pilot #9 will provide the following product and services: 


(i) An open web-based service that operates on massive Ethereum and Bitcoin 
public blockchain data and reports fraudulent crypto-currency and token 
transaction activity tracing that is accessible by both common end-users as 
well as by bigger financial institutions. Free basic service to common end- 
users as well as paid or agreement based additional customized services will 
be provided to larger organizations and agencies. 

(ii) Token transaction analysis services on the permissioned Hyperledger Fabric 
which is currently not offered by other companies. 


e There are well established competitors: But we are not aware of existing 
approaches that pay special focus on HPC and scalability. For sustain- 
ability of the system, scalability is important which we have learnt to 
achieve by developing our software using parallel programming. 

e Technological development issues: New blockchain transaction through- 
puts are increasing to thousands of transactions per second which means 
billions of transactions need to be handled. Our system is ready for such 
workloads. 

e Stakeholders’ engagement and stakeholders’ feedback: User interface is 
being improved. National CBDC efforts also need scalable transaction 
tracing technology. 
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e Regulatory compliance: Crypto asset regulations are still evolving. Our 
system can help organizations to meet the new guidelines on crypto assets 
published by FATE 

e Al assessment: Minimal Risk: We also use other graph algorithms, and 
all results are checked by human personnel. 


7.4.4 Pilot #10 - Real-time Cybersecurity Analytics on 
Financial Transactions’ BigData 


The purpose of this service is to enable security-related anomalies to be identified 
while they are occurring, if possible, by proactively monitoring and taking timely 
action on such potential security threats. The software component will be able to 
monitor in real time the financial transactions of a domestic and mobile banking 
system and will use machine learning models, alongside and in combination with 
traditional high-efficiency analysis techniques, applied on high-volume real data 
flows. 

Thus, the pilot will move from the current post-event detection approaches to a 
new real-time approach that will be based on Big Data Analytics (BDA) technolo- 
gies. For this pilot’s reference scenario, the business service to be delivered is reliant 
on advances in precise and fine-grain financial fraud analysis and detection. 

Such a business service will allow to meet two goals: 


e The early detection of new and subtle types of frauds. Since fraudsters keep 
innovating novel ways to scam people and online systems, it becomes crucial 
to apply AI/ML methods to detect outliers in large transactional datasets and 
be robust to changing patterns. 

* The reduction of the number of false positives which are usually analysed 
to understand if they are real fraud attempts or not. To this aim, it is very 
important to be able to train, validate and test ML models to make the most 
accurate ones operational. 


o Technological capabilities. The system proposed is mainly designed and 
developed on top of a Data Science and Machine Learning advanced 
frameworks and open-source technologies for design, deployment, exe- 
cution, and monitoring of big data analytics workflows. 

o Full potential achievement. The system proposed is based on ML model 
training by (small) synthetic data. This point was considered a weakness: 
actual degree of innovation can be assessed only with real data processing. 
Pilot#10 has been validated on big data got by an Al-based Synthetic 
Transactions Generator. 

o Scalability and Integration. The proposed solution is built referencing to 
a cloud based architecture able to scale computing and storage resources 
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thanks to a workflow orchestration engine that leverages the capabilities 
of Kubernetes for cloud resource management. 

o Trustworthiness and compliance. The system proposed is designed by tak- 
ing into account the banking data sovereignty. Such a prototype will be 
released to be integrated on-premise so to keep all the existing internal 
procedures and avoid the external data processing. 


7.4.5 Pilot #16 - Data Analytics Platform to Detect Payments 


Anomalies Linked to Money Laundering Events 


INFINITECH Pilot #16 “Data Analytics Platform to detect payments anoma- 
lies linked to money laundering events” aims to build a data analytics platform 
to help Nexi AML team to discover, monitor and analyze suspicious scenar- 
ios related to money laundering through digital card payments, using machine 
learning and advanced analytics methods. To build the data platform, the part- 
ners are collecting historical data from clients in an anonymized format. The 
anonymization phase is performed before source data is uploaded into the Data 
Lake, where the processing steps are performed. This allows to ensure the data 
confidentiality and privacy of customers. Since all data is anonymized, we can use 
it directly to perform data processing steps and the training of machine learning 
algorithms. 

During the pilot development, we encountered different types of challenges and 
experiences. They gave us the opportunity to work on novel approaches, data, 
applications, errors, and improvement opportunities. We summarized them in 
terms of: 


* Type: project level involved. E.g.: technical, organizational, etc. 


e Change/Performance Consequence: task/issue requiring actions. 
e Cause: what led to the change/consequence. 
* Action: what we did to achieve the goal. 
Change or Performance 
Type Consequence Cause Action 
"Technical Initial difficulties in using Neo4J is a novel graph Frequent interaction to 
Back-end Neo4J database approach, share preliminary testing 
especially applied to AML results and increase 
data collective knowledge 
Technical Data QA Assessment performed on Training analysis on 
Back-end edges and nodes rather known use-cases to 


than tabular data 


establish benchmarks for 
further progressive 


evaluations 
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Change or Performance 
Type Consequence Cause Action 
Communication Slight mismatches on Insufficient Adoption of Agile 
mock-up characteristics communication on methodology approach 


progression of features 


and sprint reviews 


characteristics 
Stakeholders Suitability of internal AML AML stakeholders need to Development of uses-cases 
involvement stakeholders involvement get used to a novel on AML subjects and/or 
with daily working tasks approach while performing merchants which actually 
daily activities are under investigation 
Technical Deployment of pilot Integration of Nexi Development of a 
Infrastructure components on infrastructure and preliminary flowchart to 
INFINITECH deliverables with organize and prioritize the 
infrastructure INFINITECH testbed development and 


deployment of each 


component 


7.5 Cluster/Category 4: Personalized Usage-Based 
Insurance (UBI) Pilots 


The concept of Usage Based Insurance (UBI) is taken to the next level within this 
cluster by adding AI technologies to the real world data collected from the users’ 
environment. Here, different IoT infrastructures are exploited as novel data sources 
and combined with different ML/DL technologies to define, develop and train 
specific AI models. These AI powered models will assist users’ classification and risk 
detection processes, and, in turn, support customised services offered to insured 
parties and insurance companies. Specifically, this cluster develops a pilot focused 
on connected cars and motor insurance and another pilot related to activity trackers 
and e-health (See Figure 7.2). 


753 Pilot #11 - Personalised Insurance Products Based on loT 


Connected Vehicles 


This pilot is oriented to the car insurance business, relying on rising connected car 
infrastructures and considering each connected vehicle as an IoT entity. Big Data, 
HPC and AI technologies will be applied here, together with the new Vehicle to 
Infrastructure (V2I) paradigm, to identify and define diverse driving profiles and 
so, classify real drivers according to their behaviour. This new business innovation 
revisits the way insurance premiums are calculated and so, supports a new set of 
insurance products adapted to the car insured clients (Figure 7.3). 
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Figure 7.2. Personalised insurance pilots’ common approach. 
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Figure 7.3. Personalised motor insurance pilot architecture. UBI paradigm 
implementation. 


During the deployment of this motor insurance pilot we've collected a set of 
lessons learnt, based on the experience of our internal insurance company, our 
data providers and the stakeholders’ meetings we organised. From different points 
of view, this feedback will help on the future expansion of the proposed business 
model. 

From (Motor) Drivers perspective, engagement of drivers to provide their 


technical vehicle data on a voluntary basis is fundamental: 


* from volunteers /drivers. perspective, to train and test the AI models. With- 


out any initiative and benefits derived from this data gathering process, it is 


142 Lessons Learnt from INFINITECH Pilots 


likely to have dropouts or not use their vehicles on a regular basis. Within the 
pilot, our vehicles’ provider engaged real drivers. 
e from insureds’/drivers’ perspective, to evaluate their driving behaviour. In 


this real scenario, it would be necessary to find ways (even if there is premi- 
ums’ increase) to ensure this participation. Finding different ways to capture 
technical vehicle datasets is another challenge. 


From Insurance companies’ perspective, these are carefully studying the dif- 
ferent possibilities that the AI insights are providing. They're waiting to see 
the final services working to apply the aforementioned insights. We need to 
remark the relevance of the AI applied here and to show some previous results. 


This must be done to evolve the final business model. 
From data gathering process and homogenisation point of view, technical 
vehicle datasets, even based on OBD standards, depend a lot on the manufacturer 


and may differ, in terms of volume and measurement units, from one deployment 
to another. Context information (weather, traffic, etc.) is also quite dependent on 
the scenario. Not all scenarios can provide the same set of context data. Moreover, 


context information need to be relevant to the areas of driving. 

From ML engineers’ perspective, data used for the AI system is biassed by 
the set of real drivers involved (in terms of profiles and area of driving) so the AI 
prototypes would be really relevant for these set of users. 


7.5.2 Pilot #12 - Real World Data for Novel Health-Insurance 
Products 


The pilot focuses on health insurance and analyses the impact of continuously 
monitored Real-World Data (RWD), captured from users’ smart devices (bracelets, 
smartphones, etc.) or personalized risk assessment. The continuous personalized 
risk assessment is offered via analytics presented in personal and cohort-based dash- 
boards, as well as on predictions of developed AI models. Once assessed, the risk 
facilitates customization of health insurance products. 


e Insured users’ perspective 
o End users’ questionnaire findings 


— They are not really willing to share information with their insurance 
company, regardless of possible financial benefits. They are more willing 
to share lifestyle data, but by no means any medical data. 

— Personal health risk analysis is wanted as part of their benefit in the 
program: We are employing a virtual coach to give them feedback & 
advice. 
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o Analysis of pilot study feedback 


The mobile application is really important and constitutes the main 
reason for pilot study dropouts. 

They want less text (legal text is needed though, in accordance to GDPR 
and Medical Device Regulation) 

They feel the app is too demanding on their phones (less processing 
power requirements) 

They wish for more feedback (see point on risk analysis reaching the 
users from the workshop above) 

They ask for freedom on reports (to be able to report what they want, 
not what we are interested in). It is tricky though to utilise such unstruc- 
tured reports in model learning. 

The app was considered as easy to use by the end users. 

The added value of the app was low or not clear to the end users. They 
want to have an app that differentiates from the standard health app on 
their mobile phones. 

They want to have more physical activities included within the app, 
for example: biking and swimming. 

They feel that adding health data manually takes too much time, for 
example in the case of the number of glasses of water they drank during 
that day. They do not remember how much they drank when they open 
the app, and they do not think about it just after drinking a glass. 


e Overall take-away messages: 


o Life insurance is a program that runs for life! Gamification features are 


needed to keep the attention of users, ensuring compliance. Especially so 


since the users are not under strict doctor’s orders. 


o The functionalities of the Healthentia app, focus mainly on measuring 


health data. An explanation for the high number of drop-outs we expe- 


rienced could be that for mHealth users, it is not about collecting health 


data. There needs to be an added value to be willing to use the Healthentia 


app for a long-term period. So, even though potential clients can use the 


overview of their health data to monitor their own health as a first step 


in living a healthy life, this is not enough to keep engaging them in the 


long-term. 


* Insurance companies perspective (feedback from stakeholders’ workshops) 


o They acknowledge the need for premium adaptation 


o They perceive the importance of monitoring dashboards higher than 


model recommendations 
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* Regulators’ perspective (feedback from regulators’ workshop) 


o They see regulatory implications in both data collection and the applica- 
tion of the data for continuous risk assessment. As far as the data collection 
is concerned, applications that do so are commonplace. As to the health 
insurance use case, there are now the first such programs offered to the 
public, rewarding “good behaviour” with discounts. 

o It is the need to comply with regulation that clashes with some users’ 
requests for less and simpler text in the agreements. 


* ML engineers’ perspective 


o Dataquality (adherence, diversity or volume) is VERY important to obtain 
functional models. 

o Different devices provide different quality of data (accuracy, timmings, 
etc.) and also the correct usage of these devices by the insured user is some- 
thing that must be taken into consideration. 

o Anonymization is mandatory regulation-wise. Unfortunately the assess- 
ment of its effect on model quality was not possible in the study carried 
out. The data quality was low to begin with, leading to models without 
acceptable performance. This seems to be equally true for anonymised and 
non-anonymised data alike. 

o Modelbias isa huge problem. The population the model is to be applied to 
is everyone, not a small segment. Learning such a model would require an 
unmanageable number of study participants giving information for too 
long a time. The only alternative is to start such programs without the 
AI part, and build the necessary models per population segment over the 
years. 


7.6 Cluster/Category 5: Configurable and Personalized 
Insurance Products for SMEs and Agro-Insurance 


Cluster 5 is composed of two pilots that base their analysis on big data from dif- 
ferent sources, both open and from satellite imagery to gather real-world and real- 
time data. The objective is to develop AI powered services to enhance risk profiling. 
These pilots will develop their own architecture by combining INFINITECH and 
pilots specific technologies, configure their corresponding sandboxes and run their 
testbeds, all within the INFINITECH framework. 

Cluster 5 pilots are intended to provide configurable and personalised insurance 
products based on alternative data sources and big data, including various person- 
alised services based on customer centric analytics and personalised digital assistants 


(Figure 7.4). 
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Figure 7.4. Cluster 5 layout. 


7631 Pllot #13 - Alternative/Automated Insurance Risk 
Selection - Product Recommendation for SME 


Focuses by obtaining the data in open sources and the application of machine learn- 
ing, the pilot will be able to monitor the changes in the risks, so we will be able to 
radically improve the risk management that companies face in the development of 
their daily activity. 

The activities carried out based on the collection of comments and opinions 
by Pilot #13 regarding stakeholders have been done through a workshop and the 
completion of online questionnaires. The online questionnaires had to be carried 
out as the organized workshop was not sufficiently attended and we considered the 
reinforcement of feedback through the questionnaires. 

Therefore, one of the lessons learned is that mass workshops involving differ- 
ent Pilots, even if they are insurance Pilots, are not effective. This is because, 
given the stage of maturity of the project, we believe that it is more effective for 
each Pilot to seek personalized feedback from the stakeholders who can contribute 
most to the project. Stakeholders, for example from an insurance company, are 
very different whether we are talking about business, agriculture, auto or health 
insurance. 

That said, the feedback we got was very interesting and enriching as it proved that 
Pilot #13 is considered a high priority by the stakeholders, immediately applicable 
and highly innovative both technologically and business-wise. 

In addition to the above with respect to the stakeholders, we complement the 
lessons learned in the following areas: 


* Data — The density of the raw information obtained from open sources has 
been much more complex than anticipated but two measures have been taken 
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during implementation which have proved effective. The incorporation of 
other sources to allow for overlap and complementarity of data. 

* Technology - It is not viable, when using the data, to depend only on obtain- 
ing it from sources via API, so the technology of Wenalyzes concurrent 
autonomous micro robots has been adapted to this reality in order to, as 
mentioned in the previous point, improve the density and quality of the 
data. 

* Market and insurers — It is undeniable that a data-driven industry such as 
insurance needs the solution developed in pilot #13, however the go to market 
to interact with insurers has taken longer than expected. They are not certain 
of the need for such data, so the best way we have found to demonstrate the 
need for the solution is to run pilots and demos with their data to compare 
the different qualities of the data. 


76.2 Pilot #14 - Big Data and loT for the Agricultural 
Insurance Industry 


Provide Insurance companies with a robust and cost-effective toolbox of func- 
tions and services — allowing them to alleviate the effect of weather uncertainty 
when estimating risk of AgI products, reduce the number of on-site visits for 
claim verification, reduce operational and administrative costs for monitoring of 
insured indexes and contract handling, and design more accurate and personalized 
insurance/coverage products. 

Pilot actors — AgI companies and stakeholders involved in the Insuretech work- 
shop series, had significantly contributed with their comments and suggestions to 
what resulted to be the INFINITECH Agl toolbox MVP. 

Having engaged a large number of actors coming from different regions, pro- 
viding different coverages/Agl insurance products, showcasing different levels of IT 
infrastructure, we witnessed a large diversification in the needs priority but at the 
same time a large homogenization of expectations. This was more or less what we 
expected since we started the process of involving Ag] actors coming from different 
enterprises (maturity of markets and sectors, different sizes, varied market outreach, 
significantly different portfolio of products etc). 

Weather information (past climatic and now-casting) seems to be a priority for 
the majority of the Agl actors compared to the use of satellite data; AgI actors readi- 
ness to utilise upscaled weather information whether in operations management 
or in risking assessment and premium underwriting, is higher. On the other hand 
although the use of satellite data for event and contract related information (affected 
area, severity of impact, prioritisation of in-field visits etc) was also medium to 
high, though AgI actors readiness to utilise was hindered by current regulation 
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restrictions (majority of Agriculture Insurance systems require the in-field assess- 
ment of damage). 

Despite these variations and respective legislation “obstacles”, the 
INFINITECH-AgI toolbox is being praised for its functionalities and easiness of 
use, its cost-efficiency and availability of services, enabling a holistic approach to 
Agl operations. 
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We begin the book by providing a background on the FinTech. The FinTech sig- 
nifies an on-going technological revolution in financial sector. For example, means 
of payment is now shifted towards payment by contact-less mechanism or cards, 
replacing old means of payment such as cheques. According to Juniper FinTech 
is: The use of technology to underpin the delivery of financial services. FinTech has 
two main pillars, namely technology and financial system. FinTech disrupts the 
financial services industry by enhancing customer experience, increasing the speed 
of service, and reducing operating cost through digitization. Despite resistance to 
changing environment, financial and insurance services sectors are now embrac- 
ing this disruption. The waves of digitization, Financial Technology (FinTech) and 
Insurance Technology (InsuranceTech) are rapidly transforming the financial and 
insurance services industry. Data is epi-centre of these changes. The vast majority 
of digital transformation applications for the finance and insurance sectors are data 
intensive. Available bigdata sources fules new more automated, personalized, and 
accurate services. 

This book provide a summary of INFINITECH Reference Architecture (RA) 
as the core element of INFINITECH Way Foundation. Reference Architecture 
(RA) of the INFINITECH project aimed to develop smart, autonomous and 
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personalized services in the European fnance and insurance services ecosystem. The 
INFINITECH-RA will specify a set of building blocks that will support advanced 
BigData, AI and IoT applications. These building blocks will support scalable, uni- 
fied, and interoperable data collection from different sources and databases. We 
provide a detail description of RA in BIGDATA/IOT/AI in finance and insurance 
sectors. INFINITECH-RA will specify the structuring principles that will drive the 
integration of these building blocks in real-life solutions. 

The book explains innovative technologies for financial sectors applied in 
INFINITECH project. The INFINITECH project rely in the capabilities of its 
partner members to produce value that is exploitable beyond the lab or Proof 
of Concept of ideas. The expertise from academia and research converges with 
industrial products to provide solutions to existing problems in financial sector. 
The existing capabilities of data management in the industry is not sufficient. 
INFINITECH will provide solutions for integrated data management over the 
wide range of databases and data sources used by BigData, IoT and AI applica- 
tions in finance/insurance. 

The marketplace is considered the main driver for assets utilization among mem- 
bers of the INFINITECH ecosystem, as well as the main artefact for exploitation 
and sustainability of the project. As the specifications and detailed architecture of 
the INFINITECH marketplace have been provided in past deliverables, and in 
specific D8.1 and D8.2, this document described and analyzed, in greater detail 
compared to the previous version of the current deliverable (i.e. D8.3), the final 
version of the developed REST API endpoints of the INFINITECH Marketplace’s 
back-end. 

The marketplace offers ready-to-use solutions covering a wide range of modern 
business and technical needs, focusing, as most pilots of INFINITECH project, on 
the Finance and Insurance sectors. Of course, most assets are related to Big Data and 
AI techniques and algorithms, with a variety of datasets, experimentation results 
and models being already available at the marketplace. Other than ready-to-use 
algorithms, frameworks and combined solutions, some assets offer state-of-the-art 
IoT and Blockchain solutions. 

Additionally, the INFINITECH Marketplace aspires to become a prominent 
multi-sided market platform and Virtualized Digital Innovation Hub (VDIH) 
and continue adding value to practitioners, organizations and communities of 
the Finance and Insurance sector long after the INFINITECH Project has been 
completed. 

Finally, this book and the progress presented within, is well aligned with the 
project's goals and overall strategy, awaiting additional content to be populated from 
the various events and third-party activities (e.g., Hackathons, Webinars, etc.), that 
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will be the content of the last WP8 deliverables (related to third parties), until the 
end of the project. 

The project’s validated solutions at the technological or business level are 
made available at the project multi-sided market platform (marketplace) and/or 
a VDIH for wider use and commercial exploitation. Stakeholders of the 
digital finance/insurance and FinTech/InsuranceTech ecosystem interact with 
INFINITECH through market platform and the VDIH. It is a public web- 
based environment with various APIs, able to store several types of assets that 
may derive/result from the separate procedures and mechanisms that are either 
implemented in the scope of the project or not. We illustrate the component of 
marketplace such as back-end, frond-end, and VDIH. The back-end is the main 
component of the marketplace. It consists of three different layers and implements 
the main functionalities for the assets management. The front-end is the fourth 
layer of the market platform. It is a web-based server that presents the offered assets 
to the users, with a friendly UI. The front-end converts all interfaces of the back- 
end (REST API) into user friendly interfaces and provides automated forms and 
processes that make it easier for users to interact with the back-end and benefit from 
its stored assets. The VDIH organized into Training Activities and Innovation Ser- 
vices, while the Training Activities include courses, workshops and webinars, the 
Innovation Services includes acceleration programs. The VDIH pages provide all 
the resources available on the platform related to that type of content. 

To create community around the INFINITECH Marketplace and enrich it with 
new information, the new functionalities such as social login and add new informa- 
tion were added to the marketplace. Social login fulfil one of the objectives of the 
INFINITECH wich is to create a digital finance ecosystem of innovation, with IoT, 
Blockchain, BigData and AI solutions and services. To enrich the digital finance 
ecosystem of innovation, it was created forms to give the users the opportunity to 
share their solutions and services on the INFINITECH Marketplace, because it is 
important that information continues to evolve and increase. Moreover, we describe 
marketplace usage scenarios. The two scenarios include upload a VDIH and con- 
sult a VDIH. We complete the section by providing an overview of the baseline 
technologies, interfaces, API related to assets, users and descriptions, search func- 
tionalities, validation, upload description, upload and retrieve assets, and authenti- 
cation scenarios. A detailed list of assets uploaded to marketplace is also provided. 

We describe the tools and techniques that will be leveraged to implement 
the testbeds and sandboxes concepts within the INFINITECH project, consid- 
ering that the INFINITECH-RA is designed leveraging a paradigm based on a 
microservices architecture implementation, with services interacting among them 
through REST APIs. Containers benefits, microservices approach, Kubernetes 
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containers orchestration, Kubernetes architecture, INFINITECH testbeds, and 
INFINITECH sandboxes are detailed and explained in a concise manner. 

We follow on to delineate tools and techniques for management of dataset. This 
chapter describes the tools and techniques that will be leveraged for the manage- 
ment of datasets within the INFINITECH project and how they are linked and 
mapped with the concepts and techniques related to testbeds and sandboxes and 
with respect to the blueprint reference testbed environment. We illustrate data 
sources and data access and Datasets management from the blueprint reference 
environment perspective. 

Moreover, we provide an overview of the INFINITECH blueprint reference 
testbeds. This chapter describes the initial design and implementation of the 
INFINITECH blueprint reference testbed, through the actual realization (with a 
full compliance) of the INFINITECH-RA development and deployment views, 
in terms of the concrete specification and realization of the fundamental and 
target INFINITECH concepts of testbeds, sandboxes and datasets management, 
and related tools and techniques for their effective setup and deployment in the 
INFINITECH pilots and validation scenarios. We discuss development view, cre- 
ation of the EKS INFINITECH cluster, namespace, network and quote policies, 
and H=how to recreate the blueprint testbed for a specific INFINITECH Pilot. 

Last, we examine lesson learnt within each pilot. The context for this section 
within INFINITECH is the innovations that we have explored in the project (e.g., 
in architectures, processes, business models), and how they promote the global cre- 
ation and sharing of knowledge (influenced by prior innovations such as GitHub 
and GitLab, open science and open Networks of Excellence). Similar work is 
increasingly commonly used in agile pilots by governments, large corporations, and 
trans-national researchers to create value for society by combining cross-disciplinary 
expertise. 
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